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INFERENCE  ON  COUNTERFACTUAL  DISTRIBUTIONS 

VICTOR  CHERNOZHUKOVt     IVAN  FERNANDEZ- VAL§     BLAISE  MELLY* 


Abstract.  In  this  paper  we  develop  procedures  for  performing  inference  in  regression 
models  about  how  potential  policy  interventions  affect  the  entire  marginal  distribution  of 
an  outcome  of  interest.  These  policy  interventions  consist  of  either  changes  in  the  dis- 
tribution of  covariates  related  to  the  outcome  holding  the  conditional  distribution  of  the 
outcome  given  covariates  fixed,  or  changes  in  the  conditional  distribution  of  the  outcome 
given  covariates  holding  the  marginal  distribution  of  the  covariates  fixed.  Under  either  of 
these  assumptions,  we  obtain  uniformly  consistent  estimates  and  functional  central  limit 
theorems  for  the  counterfactual  and  status  quo  marginal  distributions  of  the  outcome 
as  well  as  other  function-valued  effects  of  the  policy,  including,  for  example,  the  effects 
of  the  policy  on  the  marginal  distribution  function,  quantile  function,  and  other  related 
functionals.  We  construct  simultaneous  confidence  sets  for  these  functions;  these  sets  take 
into  account  the  sampling  variation  in  the  estimation  of  the  relationship  between  the  out- 
come and  covariates.  Our  procedures  rely  on,  and  our  theory  covers,  all  main  regression 
approaches  for  modeling  and  estimating  conditional  distributions,  focusing  especially  on 
classical,  quantile,  duration,  and  distribution  regressions.  Our  procedures  are  general  and 
accommodate  both  simple  unitary  changes  in  the  values  of  a  given  covariate  as  well  as 
changes  in  the  distribution  of  the  covariates  or  the  conditional  distribution  of  the  outcome 
given  covariates  of  general  form.  We  apply  the  procedures  to  examine  the  effects  of  labor 
market  institutions  on  the  U.S.  wage  distribution. 

Key  Words:    Policy  effects,  counterfactual  distribution,  quantile  regression,  duration 
regression,  distribution  regression 
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■      ■■ '  ',  •     ■     1.  Introduction  :'   .    ., 

A  basic  objective  in  empirical  economics  is  to  predict  the  effect  of  a  potential  policy 
intervention  or  a  counterfactual  change  in  economic  conditions  on  some  outcome  variable 
of  interest.  For  example,  we  might  be  interested  in  what  the  wage  distribution  would  be 
in  2000  if  workers  have  the  same  characteristics  as  in  1990,  what  the  distribution  of  infant 
birth  weights  would  be  for  black  mothers  if  they  receive  the  same  amount  of  prenatal  care 
as  white  mothers,  what  the  distribution  of  consumers  expenditure  would  be  if  we  change 
the  income  tax,  or  what  the  distribution  of  housing  prices  would  be  if  we  clean  up  a  local 
hazardous  waste  site.  In  other  examples,  we  might  be  interested  in  what  the  distribution 
of  wages  for  female  workers  would  be  in  the  absence  of  gender  discrimination  in  the  labor 
market  (e.g.,  if  female  workers  are  paid  as  male  workers  with  the  same  characteristics), 
or  what  the  distribution  of  wages  for  black  workers  would  be  in  the  absence  of  racial 
discrimination  in  the  labor  market  (e.g.,  if  black  workers  are  paid  as  white  workers  with 
the  same  characteristics).  More  generally,  we  can  think  of  a  policy  intervention  either 
as  a  change  in  the  distribution  of  a  set  of  explanatory  variables  X  that  determine  the 
outcome  variable  of  interest  Y,  or  as  a  change  in  the  conditional  distribution  of  i '  given 
X .  Policy  analysis  consists  of  estimating  the  effect  on  the  distribution  of  V  of  a  change  in 
the  distribution  of  X  or  in  the  conditional  distribution  of  Y  given  A'. 

In  this  paper  we  develop  procedures  to  perform  inference  in  regression  models  about 
how  these  counterfactual  policy  interventions  affect  the  entire  marginal  distribution  of  1'. 
The  main  assumption  is  that  either  the  policy  does  not  alter  the  conditional  distribution 
of  Y  given  A'  and  onl}'  alters  the  marginal  distribution  of  A,  or  that  the  policy  does  not 
alter  the  marginal  distribution  of  A'  and  only  alters  the  conditional  distribution  of  K  given 
X.  Starting  from  estimates  of  the  conditional  distribution  or  quantile  functions  of  the 
outcome  given  covariates,  we  obtain  uniformly  consistent  estimates  for  functional  of  the 
marginal  distribution  function  of  the  outcome  before  and  after  the  intervention.  Examples 
of  these  functional  include  distribution  functions,  quantile  functions,  quantile  policy  ef- 
fects, distribution  policy  effects,  means,  variances,  and  Lorenz  curves.  We  then  construct 
confidence  sets  around  these  estimates  that  take  into  account  the  sampling  variation  com- 
ing from  the  estimation  of  the  conditional  model.  These  confidence  sets  are  uniform  in  the 
sense  that  they  cover  the  entire  functional  of  interest  with  pre-specified  probability.  Our 
analysis  specifically  targets  and  covers  the  principal  approaches  to  estimating  conditional 
distribution  models  most  often  used  in  empirical  work,  including  classical,  quantile,  du- 
ration, and  distribution  regressions.    Moreover,  our  approach  can  be  used  to  analyze  the 


effect  of  both  simple  interventions  consisting  of  unitary  changes  in  the  values  of  a  given 
covariate  as  well  as  more  elaborate  policies  consisting  of  general  changes  in  the  covariate 
distribution  or  in  the  conditional  distribution  of  the  outcome  given  covariates.  Moreover, 
the  counterfactual  distribution  of  X  and  conditional  distribution  of  Y  given  X  can  corre- 
spond to  known  transformations  of  these  distributions  or  to  the  distributions  in  a  different 
subpopulation  or  group.  This  array  of  alternatives  allows  us  to  answer  a  wide  variety  of 
policy  questions  such  as  the  ones  mentioned  in  the  first  paragraph. 

To  develop  the  inference  results,  we  establish  the  functional  (Hadamard)  differentiability 
of  the  marginal  distribution  functions  before  and  after  the  policy  with  respect  to  the  limit 
of  the  functional  estimators  of  the  conditional  model  of  the  outcome  given  the  covariates. 
This  result  allows  us  to  derive  the  asymptotic  distribution  for  the  functionals  of  interest 
taking  into  account  the  sampling  variation  coming  from  the  first  stage  estimation  of  the 
relationship  between  the  outcome  and  covariates  by  means  of  the  functional  delta  method. 
Moreover,  this  general  approach  based  on  functional  differentiability  allows  us  to  establish 
the  validity  of  convenient  resampling  methods,  such  as  bootstrap  and  other  simulation 
methods,  to  make  uniform  inference  on  the  functionals  of  interest.  Because  our  analysis 
relies  only  on  the  conditional  quantile  estimators  or  conditional  distribution  estimators 
satisfying  a  functional  central  limit  theorem,  it  applies  quite  broadly  and  we  show  it  covers 
the  major  regression  methods  listed  above.  As  a  consequence,  we  cover  a  wide  array  of 
techniques,  though  in  the  discussion  we  devote  attention  primarily  to  the  most  practical 
and  commonly  used  methods  of  estimating  conditional  distribution  and  quantile  functions. 

This  paper  contributes  to  the  previous  literature  on  estimating  policy  effects  using  re- 
gression methods.  In  particular,  important  developments  include  the  work  of  Stock  (1989), 
which  introduced  regression-based  estimators  to  evaluate  the  mean  effect  of  policy  inter- 
ventions, and  of  Gosling,  Machin,  and  Meghir  (2000)  and  Machado  and  Mata  (2005), 
which  proposed  quantile  regression-based  policy  estimators  to  evaluate  distributional  ef- 
fects of  policy  interventions,  but  did  not  provide  distribution  or  inference  theory  for  these 
estimators.  Our  paper  contributes  to  this  literature  by  providing  regression-based  policy 
estimators  to  evaluate  quantile,  distributional,  and  other  effects  (e.g.,  Lorenz  and  Gini 
effects)  of  a  general  policy  intervention  and-  by  deriving  functional  limit  theory  as  well 
as  practical  inferential  tools  for  these  policy  estimators.  Our  policy  estimators  are  based 
on  a  rich  variety  of  regression  models  for  the  conditional  distribution,  including  classical. 
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quantile,  duration,  and  distribution  regressions."'  In  particular,  our  tlieory  covers  tlie  pre- 
vious estimators  of  Gosling,  Machin,  and  Meghir  (2000)  and  Machado  and  Mata  (2005)  as 
important  special  cases.  In  fact,  our  limit  theory  is  generic  and  applies  to  any  estimator  of 
the  conditional  distribution  that  satisfies  a  functional  central  limit  theorem.  Accordingly, 
we  cover  not  only  a  wide  array  of  the  most  practical  current  approaches  for  estimating 
conditional  distributions,  but  also  many  other  existing  and  future  approaches,  including, 
for  example,  approaches  that  accommodate  endogeneity  (Abadie,  Angrist,  and  Imbens, 
2002,  Chesher  ,  2003,  Chernozhukov  and  Hansen,  2005,  and  Imbens  and  Newey,  2009). ^ 

Our  paper  is  also  related  to  the  literature  that  evaluates  policy  effects  and  treatment 
effects  using  propensity  score  methods.  The  influential  article  of  DiNardo,  Fortin,  and 
Lemieux  (1996)  developed  estimators  for  counterfactual  densities  using  propensity  score 
reweighting  in  the  spirit  of  Horvitz  and  Thompson  (1952).  Important  related  work  by 
Hirano,  Imbens,  and  Ridder  (2003)  and  Firpo  (2007)  used  a  similar  reweighting  approach 
in  exogenous  treatment  effects  models  to  construct  efficient  estimators  of  average  and 
quantile  treatment  effects,  respectively.  As  we  comment  later  in  the  paper,  it  is  possible 
to  adapt  the  reweighting  methods  of  these  articles  to  develop  policy  estimators  and  limit 
theory  for  such  estimators.  Here,  however,  we  focus  on  developing  inferential  theory  for 
policy  estimators  based  on  regression  methods,  thus  supporting  empirical  research  using 
regression  techniques  as  its  primary  method  (Buchinsky,  1994,  Chamberlain,  1994,  Han 
and  Hausman,  1990,  Machado  and  Mata,  2005).  The  recent  book  of  Angrist  and  Pischke 
(2008,  Chap.  3)  provides  a  nice  comparative  discussion  of  regression  and  propensity  score 
methods.  Finally,  a  related  work  by  Firpo,  Fortin,  and  Lemieux  (2007)  studied  the  effects 
of  special  policy  interventions  consisting  of  marginal  changes  in  the  values  of  the  covari- 
ates.  As  we  comment  later  in  the  paper,  their  approach,  based  on  a  linearization  of  the 
functionals  of  interest,  is  quite  different  from  ours.  In  particular,  our  approach  focuses 
on  more  general  non- marginal  changes  in  both  the  marginal  distribution  of  covariates  and 
conditional  distribution  of  the  outcome  given  covariates. 


We  focus  on  semi-parametric  estimators  due  to  their  dominant  role  in  empirical  work  (Angrist  and 
Pischke,  2008).  In  contrast,  fully  nonparametric  estimators  are  practical  only  in  situations  with  a  small 
number  of  regressors.  In  future  work,  however,  we  hope  to  extend  the  analysis  to  nonparametric  estimators. 
"In  this  case,  the  literature  provides  estimators  for  Fy^,  the  distribution  of  potential  outcome  Y  under 
treatment  d,  and  Fd,z,  the  joint  distributions  of  (endogenously  determined)  treatment  status  D  and 
exogenous  regressors  Z  before  and  after  policy.  As  long  as  the  estimator  of  Fy^  satisfies  the  functional 
central  limit  theorem  specified  in  the  main  text  and  the  estimator  of  Fo,z  satisfies  the  functional  central 
limit  theorem  specified  in  Appendix  D,  our  inferential  theory  applies  to  the  resulting  policy  estimators. 


We  illustrate  our  estimation  and  inference  procedures  with  an  analysis  of  the  evolution  of 
the  U.S.  wage  distribution.  Our  analysis  is  motivated  by  the  influential  article  by  DiNardo, 
Fortin,  and  Lemieux  (1996),  which  studied  the  institutional  and  labor  market  determinants 
of  the  changes  in  the  wage  distribution  between  1979  and  1988  using  data  from  the  CPS. 
We  complement  and  complete  their  analysis  by  using  a  wider  range  of  techniques,  including 
quantile  regression  and  distribution  regression,  providing  standard  errors  for  the  estimates 
of  the  main  effects,  and  extending  the  analysis  to  the  entire  distribution  using  simultaneous 
confidence  bands.  Our  results  reinforce  the  importance  of  the  decline  in  the  real  minimum 
wage  in  explaining  the  increase  in  wage  inequality.  They  also  indicate  the  importance  of 
changes  in  both  the  composition  of  the  workforce  and  the  returns  to  worker  characteristics 
in  explaining  the  evolution  of  the  entire  wage  distribution.  Our  results  show  that,  after 
controlling  for  other  composition  effects,  the  process  of  de-unionization  during  the  80s 
played  a  minor  role  in  explaining  the  evolution  of  the  wage  distribution. 

We  organize  the  rest  of  the  paper  as  follows.  In  Section  2  we  describe  methods  for 
performing  counterfactual  analysis,  setting  up  the  modeling  assumptions  for  the  counter- 
factual  outcomes,  and  introduce  the  policy  estimators.  In  Section  3  we  derive  distributional 
results  and  inferential  procedures  for  the  policy  estimators.  In  Section  4  we  present  the 
empirical  application,  and  in  Section  5  we  give  a  summary  of  the  main  results.  In  the 
Appendix,  we  include  proofs  and  additional  theoretical  results.  ,    : 


//      ■-  2.  Methods  FOR  CouNTERF.ACTUAL  Analysis 

2.1.  Observed  and  counterfactual  outcomes.  In  our  analysis  it  is  important  to  distin- 
guish between  observed  and  counterfactual  outcomes.  Observed  outcomes  come  from  the 
population  before  the  policy  intervention,  whereas  (unobserved)  counterfactual  outcomes 
come  from  the  population  after  the  potential  policy  intervention.  We  use  the  observed 
outcomes  and  covariates  to  establish  the  relationship  between  outcome  and  covariates  and 
the  distribution  of  the  covariates,  which,  together  with  either  a  postulated  distribution  of 
the  covariates  under  the  policy  or  a  postulated  conditional  distribution  of  outcomes  given 
covariates  under  the  policy,  determine  the  distribution  of  the  outcome  after  the  policy 
intervention,  under  conditions  precisely  stated  below. 

We  divide  our  population  in  two  groups  or  subpopulations  indexed  by  j  G  {0,  1}.  Index 
0  corresponds  to  the  status  quo  or  reference  group,  whereas  index  1  corresponds  to  the 
group  from  which  we  obtain  the  marginal  distribution  of  A'  or  the  conditional  distribution 
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of  Y  given  A'  to  generate  the  counterfactual  outcome  distribution.'^  In  order  to  discuss 
various  regression  models  of  outcomes  given  covariates,  it  is  convenient  to  consider  the 
following  representation.  Let  Qyj{u\x)  be  the  conditional  u-quantile  of  Y  given  X  in 
group  j,  and  let  Fx^  be  the  marginal  distribution  of  the  p- vector  of  covariates  X  in  group 
k,  for  j,k  €  {0.  1}.  We  can  describe  the  observed  outcome  W  in  group  j  as  a  function  of 
covariates  and  a  non-additive  disturbance  [/j  via  the  Skorohod  representation: 

yJ  =  QY^{Ui\Xj),  where  U]  ~  (7(0, 1)  independently  of  Xj  ~  F.v^,  for  j  e  {0, 1}. 

Here  the  conditional  quantile  function  plays  the  role  of  a  link  function.  More  generally 
we  can  think  of  Qy  {u\x)  as  a  structural  or  causal  function  mapping  the  covariates  and 
the  disturbance  to  the  outcome,  where  the  covariate  vector  can  include  control  variables 
to  account  for  endogeneity.  In  the  classical  regression  model,  the  disturbance  is  separable 
from  the  covariates,  as  in  the  location  shift  model  described  below,  but  generally  it  need 
not  be.  Our  analysis  will  cover  either  case. 

We  consider  two  different  counterfactual  experiments.  The  first  experiment  consists 
of  drawing  the  vector  of  covariates  from  the  distribution  of  covariates  in  group  1,  i.e., 
A'l  ~  Fvj,  while  keeping  the  conditional  quantile  function  as  in  group  0,  Qyo{u\x).  The 
counterfactual  outcome  Y^  is  therefore  generated  by 

VqI  :=  Qy^{Ul\Xx),  where  U^  ~  t/(0, 1)  mdependently  of  A'l  ~  Fx,-  (2.1) 

This  construction  assumes  that  we  can  evaluate  the  quantile  function  Qy^^{u\x)  at  each 
point  X  in  the  support  of  A'j.  This  requires  that  either  the  support  of  A'l  is  a  subset  of 
the  support  of  A'o  or  we  can  extrapolate  the  quantile  function  outside  the  support  of  A^o- 

For  purposes  of  analysis,  it  is  useful  to  distinguish  two  different  ways  of  constructing 
the  alternative  distributions  of  the  covariates.  (1)  The  covariates  before  and  after  the 
policy  arise  from  two  different  populations  or  subpopulations.  These  populations  might 
correspond  to  different  demographic  groups,  time  periods,  or  geographic  locations.  Spe- 
cific examples  include  the  distributions  of  worker  characteristics  in  different  years  and 
distributions  of  socioeconomic  characteristics  for  black  versus  white  mothers.  (2)  The 
covariates  under  the  policy  intervention  arise  as  some  known  transformation  of  the  covari- 
ates in  group  0;  that  is  A'l  =  ^(A'o),  where  g{-)  is  a  known  function.  This  case  covers,  for 


Our  results  also  cover  the  policy  intervention  of  changing  both  the  marginal  distribution  of  X  and 
the  conditional  distribution  of  Y  given  X.  In  this  case  the  counterfactual  outcome  corresponds  to  the 
observed  outcome  in  group  1 . 


example,  unitary  changes  in  the  location  of  one  of  the  covariates, 

where  ej  is  a  unitary  p- vector  with  a  one  in  the  position  j;  or  mean  preserving  redistribu- 
tions of  the  covariates  implemented  as  Xj  =  (1  —  a)£'[A'o]  +  cuXq.  These  types  of  policies 
are  useful  for  estimating  the  effect  of  smoking  on  the  marginal  distribution  of  infant  birth 
weights,  the  effect  of  a  change  in  taxation  on  the  marginal  distribution  of  food  expendi- 
ture, or  the  effect  of  cleaning  up  a  local  hazardous  waste  site  on  the  marginal  distribution 
of  housing  prices  (Stock,  1991).  Even  though  these  two  cases  correspond  to  conceptually 
different  thought  experiments,  our  econometric  analysis  will  cover  either  situation  within 
a  unified  framework. 

The  second  experiment  consists  of  generating  the  outcome  from  the  conditional  quantile 
function  in  group  1,  Qyj{u\x),  while  keeping  the  marginal  distributions  of  the  covariates 
as  in  group  0,  that  is,  Xq  ~  Fxq-  The  counterfactual  outcome  y'°  is  therefore  generated  by 

Y°  :=  QyM\Xo).  where  U°  ~  L/(0, 1)  independently  of  Xq  ~  Fx^-  (2.2) 

This  construction  assumes  that  we  can  evaluate  the  quantile  function  Qyi{u\x)  at  each 
point  X  in  the  support  of  Xq.  This  requires  that  either  the  support  of  Xq  is  a  subset  of 
the  support  of  Xi  or  we  can  extrapolate  the  quantile  function  outside  the  support  of  Xi . 

In  this  second  experiment,  the  conditional  quantile  functions  before  and  after  the  policy 
intervention  may  arise  from  two  different  populations  or  subpopulations.  These  popu- 
lations might  correspond  to  different  demographic  groups,  time  periods,  or  geographic 
locations.  This  type  of  policy  is  useful  for  conceptualizing,  for  example,  what  the  distri- 
bution of  wages  for  female  workers  would  be  if  they  were  paid  as  male  workers  with  the 
same  characteristics,  or  similarly  for  blacks  or  other  minority  groups.       '-  '  ■ 

We  formally  state  the  assumptions  mentioned  above  as  follows: 

Condition  M.  Counterfactual  outcome  variables  of  interest  are  generated  by  either 
(2.1)  or  (2.2).  The  conditional  distributions  of  the  outcome  given  the  covariates  in  both 
groups,  namely  the  conditional  quantile  functions  Qy  [■{■)  or  the  conditional  distribution 
functions  Fyj[-\-)  for  j  G  {0, 1},  apply  or  can  be  extrapolated  to  all  x  ^  X ,  where  X  is  a 
compact  subset  ofW^  that  contains  the  supports  of  Xq  and  Xi. 

2.2.  Parameters  of  interest.  The  primary  (function-valued)  parameters  of  interest  are 
the  distribution  and  quantile  functions  of  the  outcome  before  and  after  the  policy  as  well 
as  functionals  derived  from  them. 


In  order  to  define  these  parameters,  we  first  recall  that  the  conditional  distribution 
associated  with  the  quantile  function  Qyj{u\x)  is: 


Fy-(y|x)-   /   I  {QyM\^)  <  y}  du,    je{0,l}. 


(2.3) 


Given  our  definitions  (2.1)  or  (2.2)  of  the  rounterfactual  outcome,  the  marginal  distribu- 
tions of  interest  are: 

F>;.{y)  :=  Prjy/  <  y}  =   f  F,- (y|x)(iFv,(x),  j,k  €  {0,1}  ,        (2.4) 

The  corresponding  marginal  quantile  functions  are: 

Q'y^{u)  =  M{y:F^.{y)>u},  J,/cG{0,1}.        , 

The  u-quantile  policy'  effect  and  the  y-distribution  policy  effect  are: 

QE^^{u)  =  Q'y^{u)-Q%{u)    and    DF^/y)  =  F^^(y)  -  F°^(y),  j,/cG{0,1}. 

It  is  useful  to  mention  a  couple  of  examples  to  understand  the  notation.  For  instance, 
Qy^{u)  —  Qy^{u)  is  the  quantile  effect  under  a  pohcy  that  changes  the  marginal  distribution 
of  covariates  from  F_Vo  to  F.Vi,  fixing  the  conditional  distribution  of  outcome  to  FYo(y\x). 
On  the  other  hand,  Qy^iu)  —  Qy^iu)  is  the  quantile  effect  under  a  policy  that  changes 
the  conditional  distribution  of  the  outcome  from  FYg{y\x)  to  F>-j(y|x),  fixing  the  marginal 
distribution  of  covariates  to  Fxo  ■ 

Other  parameters  of  interest  include,  for  example,  Lorenz  curves  of  the  observed  and 
counterfactual  outcomes.  Lorenz  curves,  commonly  used  to  measure  inequality,  are  ratios 
of  partial  means  to  overall  means 

/y  roc 

iclF^^ii)/  j^     IdF^^il), 

defined  for  non-negative  outcomes  only.  More  generally,  we  might  be  interested  in  arbitrary 
functionals  of  the  marginal  distributions  of  the  outcome  before  and  after  the  interventions 

Hy{y):=<p{y,FlF,\,F^.^,F°,).  (2.5) 

These  functionals  include  the  previous  examples  as  special  cases  as  well  as  other  examples 
such  as  means,  with  //y(y)  =  j^^tdFy  {t)  =:  fiy  ;  mean  policy  effects,  with  Hyiy)  = 
fUy  —  l-i-Yg':  variances,  with  HY{y)  =  J^  t~dFy  (t)  —  (/iy.  )^  =:  [uy  )^;  variance  policy  effects, 
with  Hyiy)  -  (4^)2  -  {a°Yj-,  Lorenz  policy  ekcts,  with  Hy{y)  =  L{y,  F,\)  -  L{y,  F°J  =: 
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LEy  {y)\  Gini  coefficients,  with  Hyiy)  =  1  -  2  J^L{Fy  ,y)dy  =:  Gy- ;  and  Gini  policy 
effects,  with  Hyiy)  =  G^^  -  G^^  =:  GE^y^ 

In  the  case  where  the  policy  consists  of  either  a  known  transformation  of  the  covariates, 
Xi  =  giXo),  or  a  cliange  in  the  conditional  distribution  of  Y  given  X,  we  can  also  identify 
the  distribution  and  quantile  functions  for  the  effect  of  the  policy,  A^  =  Yj'  —  Yq,  by: 

FLi5)=   f   [   l{QA,{u\x)<5}dudFx,ix),  j,fcG{0,l},  (2.6) 

J  a:  Jo 

where  Q^g{u\x)  =  QYo{u\g{x))  -  QYoiu\x)  and  Qai{u\x)  =  Qy^{u\x)  -  (5v'o(u|x);  and 

(5i^(a)  =  inf{5  :  F^^{5)  >  a},  j,  k  E  {0,  1},  (2.7) 

under  the  additional  assumption  (Heckman,  Smith,  and  Clements,  1997): 

Condition  RP.  Conditional  rank  preservation:  Uq  =  Uq\Xo  and  11°  =  Uq\Xo. 

2.3.  Conditional  models.  The  preceding  analysis  shows  that  the  marginal  distribution 
and  quantile  functions  of  interest  depend  on  either  the  underlying  conditional  quantile 
function  or  conditional  distribution  function.  Thus,  we  can  proceed  by  modeling  and  esti- 
mating either  of  these  conditional  functions.  We  can  rely  on  several  principal  approaches 
to  carrying  out  these  tasks.  In  this  section  we  drop  the  dependence  on  the  group  index  to 
simplify  the  notation. 

Example  1.  Classical  regression  and  generalizations.  Classical  regression  is  one 
of  the  principal  approaches  to  modeling  and  estimating  conditional  quantile  functions. 
The  classical  location-shift  model  takes  the  form 

Y  =  7n{X)  +  V,    V  =  Qy{U),  ■  (2.8) 

where  U  ~  (7(0,  1)  is  independent  of  X ,  and  m{-)  is  a  location  function  such  as  the 
conditional  mean.  The  disturbance  V  has  the  quantile  function  Qv{u),  and  Y  therefore  has 
conditional  cjuantile  function  Qy{u\x)  =  m{x)  +  Q\/{u).  This  model  is  parsimonious  in  that 
covariates  impact  the  outcome  only  through  the  location.  Even  though  this  is  a  location 
model,  it  is  clear  that  a  general  change  in  the  distribution  of  covariates  or  the  conditional 
quantile  function  can  have  heterogeneous  effects  on  the  entire  marginal  distribution  of  Y, 
affecting  its  various  quantiles  in  a  differential  manner.   The  most  common  model  for  the 


In  the  rest  of  the  discussion  we  keep  the  distribution,  quantile,  quantile  policy  effects,  and  distribution 
policy  effects  functions  as  separate  cases  to  empheisize  the  importance  of  these  functionals  in  practice. 
Lorenz  curves  are  special  cases  of  the  general  functional  with  Hyly)  =  /^  tdFy  (t)/  j^  tdFy  (t),  and 
will  not  be  considered  separately. 


10 

regression  function  m{x)  is  linear  in  parameters,  m{x)  =  x'P,  and  we  can  estimate  it  using 
least  squares  or  instrumental  variable  methods.  We  can  leave  the  quantile  function  Qv{u) 
unrestricted  and  estimate  it  using  the  empirical  quantile  function  of  the  residuals.  Our 
results  cover  such  common  estimation  schemes  as  special  cases,  since  we  only  require  the 
estimates  to  satisfy  a  functional  central  limit  theorem. 

The  location  model  has  played  a  classical  role  in  regression  analysis.  Many  endogenous 
and  exogenous  treatment  effects  models,  for  example,  can  be  analyzed  and  estimated 
using  variations  of  this  model  (Cameron  and  Trivedi,  2005  Chap.  25,  and  Imbens  and 
Wooldridge,  2008).  A  variety  of  standard  survival  and  duration  models  also  imply  (2.8) 
after  a  transformation  such  as  the  Cox  model  with  Weibull  hazard  or  accelerated  failure 
time  model,  cf.  Docksum  and  Gasko  (1990). 

The  location-scale  shift  model  is  a  generalization  that  enables  the  covariates  to  impact 
the  conditional  distribution  through  the  scale  function  as  well: 

Y  =  m{X)  +  a{X)-V,    V  =  Qy[U), 

where  U  ~  [/(O,  1)  independently  of  X ,  and  a{-)  is  a  positive  scale  function.  In  this  model 
the  conditional  quantile  function  takes  the  form  Qy{il\x)  —  m{x)  +  cr(.r)Qv''(")-  It  is  clear 
that  changes  in  the  distribution  of  X  or  in  Q)-{u\x)  can  have  a  nontrivial  effect  on  the 
entire  marginal  distribution  of  Y\  affecting  its  various  quantiles  in  a  differential  manner. 
This  model  can  be  estimated  through  a  variety  of  means  (see,  e.g.,  Rutemiller  and  Bowers, 
1968,  and  Koenker  and  Xiao,  2002). 

Example  2.  Quantile  regression.  We  can  also  rely  on  quantile  regression  as  a 
principal  approach  to  modeling  and  estimating  conditional  quantile  functions.  In  this 
approach,  we  have  the  general  non-separable  representation 

Y  =  Qy{U\X). 

The  model  permits  covariates  to  impact  the  outcome  by  changing  not  only  the  location 
and  scale  of  the  distribution  but  also  its  entire  shape.  An  early  convincing  example  of  such 
effects  goes  back  to  Doksum  (1974),  who  showed  that  real  data  can  be  sharply  inconsistent 
with  the  location-scale  shift  paradigm.  Quantile  regression  precisely  addresses  this  issue. 
The  leading  approach  to  quantile  regression  entails  approximating  the  conditional  quantile 
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function  by  a  linear  form  Qy{u\x)  =  x'P{u)}  Koenker  (2005)  provides  an  excellent  review 
of  this  method. 

Quantile  regression  allows  researchers  to  fit  parsimonious  models  to  the  entire  condi- 
tional distribution.  It  has  become  an  increasingly  important  empirical  tool  in  applied 
economics.  In  labor  economics,  for  example,  quantile  regression  has  been  widely  used  to 
model  changes  in  the  wage  distribution  (Buchinsky,  1994,  Chamberlain,  1994,  Abadie, 
1997,  Goshng,  Machin,  and  Meghir,  2000,  Machado  and  Mata,  2005,  Angrist,  Cher- 
nozhukov,  and  Fernandez- Val,  2006,  and  Autor,  Katz,  and  Kearney,  2006b).  Variations 
of  quantile  regression  can  be  used  to  obtain  quantile  and  distribution  treatment  effects  in 
endogenous  and  exogenous  treatment  effects  models  (Abadie,  Angrist,  and  Imbens,  2002, 
Chernozhukov  and  Hansen,  2005,  and  Firpo,  2007). 

Example  3.  Duration  regression.  A  common  way  to  model  conditional  distribution 
functions  in  duration  and  survival  analysis  is  through  the  transformation  model: 

Fyiy\x)  =  exp(exp(m(x)  +  f(y))),  (2.9) 

where  t{-)  is  a  monotonic  transformation.  This  model  is  rather  rich,  yet  the  role  of  co- 
variates  is  limited  in  an  important  way.  In  particular,  the  model  leads  to  the  following 
location-shift  representation: 

t{Y)  =  m{X)  +  V,       ^  •'-:- 

where  V  has  an  extreme  value  distribution  and  is  independent  of  X .  Therefore,  covariates 
impact  a  monotone  transformation  of  the  outcome  only  through  the  location  function.  The 
estimation  of  this  model  is  the  subject  of  a  large  and  important  literature  (e.g.,  Lancaster, 
1990,  Donald,  Green,  and  Paarsch,  2000,  and  Dabrowska,  2005). 

Example  4.  Distribution  regression.  Instead  of  restricting  attention  to  transfor- 
mation models  for  the  conditional  distribution,  we  can  consider  directly  modehng  Fv(y|x) 
separately  for  each  threshold  y.  An  example  is  the  model 

Fyivlx)  =  A(m(y,x)), 

where  A  is  a  known  link  function  and  m{y,  x)  is  unrestricted  in  y.  This  specification 
includes  the  previous  example  as  a  special  case  (put  A{v)  =  exp(exp(i)))  and  m{y,x)  = 
m{x)  +  t{y))  and  allows  for  more  flexible  effect  of  the  covariates.  The  leading  example  of 


Throughout,  by  "hnear"  we  mean  specifications  that  are  Unear  in  the  parameters  but  could  be  highly 
non-linear  in  the  original  covariates;  that  is,  if  the  original  covariate  is  X ,  then  the  conditional  quantile 
function  takes  the  form  z'P{u)  where  z  =  f{x). 
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this  specification  would  be  a  probit  or  logit  link  function  A  and  m[y,x)  =  x'P{y),  were 
/3{y)  is  an  unknown  function  in  y  (Han  and  Hausman,  1990,  and  Foresi  and  Peracchi, 
1995).  This  approach  is  similar  in  spirit  to  quantile  regression.  In  particular,  as  quantile 
regression,  this  approach  leads  to  the  specification  V  =  Qy{U\X)  =  m~^{A~^{U),  X) 
where  U  ~  (/(0, 1)  independently  of  X. 

2.4.  Policy  estimators  and  inference  questions.  All  of  the  preceding  approaches  gen- 
erate estimates  Fy  {y\x),  j  G  {0,  1},  of  the  conditional  distribution  functions  either  directly 
or  indirectly  using  the  relation  (2.3): 

A',(y|x)=/'   i{Qi-(u|x)  <y}dn,  je{0,l},  (2.10) 

where  Q^-  {u\x)  is  a  given  estimate  of  the  conditional  quantile  function. 

We  then  estimate  the  marginal  distribution  functions  and  quantile  functions  for  the 
outcome  by  _  ' 

^>- (y)  =  /  Fy^{y\:r)dFx,{x),    and    Q^.  (u)  =  mf{y  :  F,'^- (y)  >  ,;,}, 

respectively,  for  j,  /c  €  {0,  1}.  We  estimate  the  quantile  and  distribution  policy  effects  by 

QEy^iu)  =  Q\.^{u)  -  Qliu),  and  De'- (y)  =  F^'-iy)  -  F^^iy). 

We  estimate  the  general  functional  introduced  in  (2.5)  similarly,  using  the  plug-in  rule: 

Hy{y)  =  d[y.FlFl„F^,,F°,).  (2.11) 

For  example,  in  this  way  we  can  construct  estimates  of  the  distribution  and  cjuantiles  of 
the  effects  defined  in  (2.6)  and  (2.7). 

Common  inference  questions  that  arise  in  policy  analysis  involve  features  of  the  dis- 
tribution of  the  outcome  before  and  after  the  intervention.  For  example,  we  might  be 
interested  in  the  average  effect  of  the  policy,  or  in  quantile  policy  effects  at  several  quan- 
tiles  to  measure  the  impact  of  the  policy  on  different  parts  of  the  outcome  distribution. 
More  generally,  in  this  analysis  many  questions  of  interest  involve  the  entire  distribution 
or  quantile  functions  of  the  outcome.  Examples  include  the  hypotheses  that  the  policy 
has  no  effect,  that  the  effect  is  constant,  or  that  it  is  positive  for  the  entire  distribution 
(McFadden,  1989,  Barrett  and  Donald,  2003,  Koenker  and  Xiao,  2002,  Linton,  Maasoumi, 
and  Whang,  2005).  The  statistical  problem  is  to  account  for  the  sampling  variability  in 
the  estimation  of  the  conditional  model  to  make  inference  on  the  functional  of  interests. 
Section  3  provides  limit  distribution  theory  for  the  policy  estimators.  This  theory  applies 
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to  the  entire  marginal  distribution  and  quantile  functions  of  the  outcome  before  and  after 
the  pohey,  and  therefore  is  vaHd  for  performing  either  uniform  inference  about  the  en- 
tire distribution  function,  quantile  function,  or  other  functionals  of  interest,  or  pointwise 
inference  about  values  of  these  functions  at  a  specific  point. 

2.5.  Alternative  approaches.  An  alternative  way  to  proceed  with  policy  analysis  is  to 
use  reweighting  methods  (DiNardo,  2002).  Indeed,  under  Condition  M,  we  can  express 
the  marginal  distribution  of  the  counterfactual  outcome  in  (2.4)  as 

F^,{y)=   [    f  l{y,'  <y}w';{x)dFy^{y\x)dFx,{x),  j,ke  {0,1},  (2.12) 

where  u;^^(x)  =  fx,{x)/fxj{x)  =  (1  -  Pj)pj{x)/[pj{l  -  Pjix))],  Pj{x)  :=  Pr{  J  =  j\X  =  x) 
is  the  propensity  score,  Pj  =  Pr{J  =  j},  J  is  an  indicator  for  group  j,  Jx^  is  the  density 
of  the  covariate  given  J  —  j,  and  y  is  the  support  of  Y.  The  second  form  of  the  weighting 
function  Wj  follows  from  Bayes'  rule.  We  can  use  the  expression  (2.12)  along  with  either 
density  or  propensity  score  weighting  to  construct  policy  estimators.  Firpo  (2007)  used 
a  similar  propensity  score  reweighting  approach  to  derive  efficient  estimators  of  quantile 
effects  in  treatment  effect  models.^  With  some  work,  one  can  adapt  the  nice  results  of  Firpo 
(2007)  to  obtain  the  results  needed  to  perform  pointwise  inference,  namely,  inference  on 
quantile  policy  effects  at  a  specific  point.  However,  we  need  to  do  more  work  to  develop  the 
results  needed  to  perform  uniform  inference  on  the  entire  quantile  or  distribution  function. 
We  are  carrying  out  such  work  in  a  companion  paper. 

In  a  recent  important  development,  Firpo,  Fortin,  and  Lemieux  (2007)  propose  an  al- 
ternative useful  procedure  to  estimate  policy  effects  of  changes  in  the  distribution  of  X . 
Given  a  functional  of  interest  0,  they  use  a  first  order  approximation  of  the  policy  effect: 

0(F4)  -  0(F°J  =  6'{F,\  -  F,\)  +  R{F,\,  F,\),         , 

where  4>'{Fy^  —  Fy^)  =  J  a{y,  Fyp)d(Fy^(y)  —  FY^{y))  is  the  first  order  finear  approximation 
term,  where  function  a  is  the  influence  or  the  score  function,  and  R{Fy^,Fy^)  is  tire  re- 
maining approximation  error.  In  the  context  of  our  problem,  this  approximation  error  is 
generally  not  equal  to  zero  and  does  not  vanish  with  the  sample  size.  Firpo,  Fortin,  and 
Lemieux  (2007)  propose  a  practical  mean  regression  method  to  estimate  the  first  order 
term  (p'^Fy^  —  F°^);  this  method  cleverly  exploits  the  law  of  iterated  expectations  and  the 


See  Angrist  and  Pischke  (2008)  for  a  detailed  review  of  propensity  score  methods  and  a  comparison 
to  regression  methods  in  the  context  of  treatment  effect  models.  The  pros  and  cons  of  these  two  methods 
are  also  likely  to  apply  to  policy  analysis.  In  this  paper  we  focus  on  the  regression  method. 
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linearity  of  tlie  term  in  the  distributions.  In  contrast  to  our  approach,  the  estimand  of 
tliis  method  is  an  approximation  to  the  pohcy  effect  with  a  non-vanishing  approximation 
error,  whereas  we  directly  estimate  the  exact  effect  0(Fy.^)  —  (piFyJ  without  approximation 
error.  :■,,',■ 


3.  Limit  Distribution  and  Inference  Theory  for  Policy  Estimators 

In  this  section  we  provide  a  set  of  simple,  general  sufficient  conditions  that  facilitate 
inference  in  large  samples.  We  design  the  conditions  to  cover  the  principal  practical  ap- 
proaches and  to  help  us  think  about  what  is  needed  for  various  approaches  to  work.  Even 
though  the  conditions  are  reasonably  general,  they  do  not  exhaust  all  scenarios  under 
which  the  main  inferential  methods  will  be  valid.  .    :  ■  , 

3.1.  Conditions  on  estimators  of  the  conditional  distribution  and  quantile  func- 
tions. We  provide  general  assumptions  about  the  estimators  of  the  conditional  quantile 
or  distribution  function,  which  allow  us  to  derive  the  limit  distribution  for  the  policy  es- 
timators constructed  from  them.  These  assumptions  hold  for  commonly  used  parametric 
and  semiparametric  estimators  of  conditional  distribution  and  quantile  functions,  such  as 
classical,  quantile,  duration,  and  distribution  regressions. 

We  begin  the  analysis  by  stating  regularity  conditions  for  estimators  of  conditional 
quantile  functions,  such  as  classical  or  quantile  regression.  In  the  sequel,  let  £°°((0, 1)  x  A") 
denote  the  space  of  bounded  functions  mapping  from  (0, 1)  x  <%"  to  R,  equipped  with 
the  uniform  metric.  We  assume  we  have  a  sample  {(A',,  >',),?'  =  I,....,/;,}  of  size  ii  for 
the  outcome  and  covariates  before  the  policy  intervention.  In  this  sample  no  =  n/Ao 
observations  come  from  group  0  and  n^  =  n/\^  observations  come  from  group  1.  In  what 
follows  we  use  ^  to  denote  weak  convergence. 

Condition  C.  The  conditional  density  fy  {y\x)  of  the  outcome  given  covariates  exists, 
and  is  continuous  and  bounded  above  and  away  from  zero,  uniformly  on  y  E  y  and  x  €  A", 
where  y  is  a  compact  subset  o/R,  for  j  G  {0, 1}. 

Condition  Q.  The  estimators  (u,x)  i— >  Qy  {u\x)  of  the  conditional  quantile  functions 
{u,x)  I—*  Qyj{u\x)  of  outcome  given  covariates  jointly  converge  in  law  to  continuous  Gauss- 
ian processes: 


n  {Qy,{u\x)  -  Qy^{u\x)^  ^  ^,V,{u,x),  J  €  {0,  1}  (3.1) 
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in  £°°((0, 1)  X  ^),  where  {u,x)  i-^  Vj{u,x),j  G   {0,1},  have  zero  mean  and  covariance 
function  Ev^^{u,x,u,x)  :=  E[Vj{u,  x)Vr{u,  x)],  for  j,r  G  {0, 1}. 

These  conditions  appear  reasonable  in  practice  when  the  outcome  is  continuous.  If  the 
outcome  is  discrete,  the  conditions  C  and  Q  do  not  hold.  However,  in  this  case  we  can  use 
the  distribution  approach  discussed  below.  Condition  C  and  Q  focus  on  the  case  where 
the  outcome  has  a  compact  support  with  a  density  bounded  away  from  zero,  which  is 
a  reasonable  first  case  to  analyze  in  detail.  Condition  Q  applies  to  the  most  common 
estimators  of  conditional  quantile  functions  under  suitable  regularity  conditions  (Doss  and 
Gill,  1992,  Gutenbrunner  and  Jureckova,  1992,  Angrist,  Chernozhukov,  and  Fernandez- Val, 
2006,  and  Appendix  F).  Conditions  C  and  Q  could  be  extended  to  include  other  cases, 
without  affecting  subsequent  results.  For  instance,  given  set  y  in  Condition  C  over  which 
we  want  to  estimate  the  counterfactual  distribution,  Condition  Q  needs  only  to  hold  over 
a  smaller  region  UX  =  {{u.x)  G  (0, 1)  x  A"  :  Qy{u\x)  e  y}  C  (0, 1)  x  M,  which  leads  to 
a  less  restrictive  convergence  requirement,  without  affecting  any  subsequent  results.  The 
joint  convergence  holds  trivially  if  the  samples  for  each  group  are  mutually  independent. 

We  next  state  regularity  conditions  for  estimators  of  conditional  distribution  functions, 
such  as  duration  or  distribution  regressions.  Let  i'^{y  x  A!)  denote  the  space  of  bounded 
functions  mapping  from  y  x  X  to  M,  equipped  with  the  uniform  metric,  where  3^  is  a 
compact  subset  of  R.  ,       .  ■:,..... 

Condition  D.  The  estimators  (y,  x)  i— >  Fy^(y|x)  of  the  conditional  distribution  func- 
tions (y,  .t)  I— >  Fy  {y\^)  of  the  outcome  given  covariates  converges  in  law  to  a  continuous 
Gaussian  processes:  >^   :  ^;^■^  ,•  ;     :  '.        .^     '.^    ■■       ■;■ 

v^(Py.(y|.T)-FK,(y|:r;))=^  V^Z,(y,,7:),  .7G{0,1},  ^       ■  (3.2) 

in  i°^{y  X  X),  where  (y,  x)  i— *  Zj(y,  x),  j  £  {0,  1},  have  zero  mean  and  covariance  function 
Sz,>(y,a;,y,x)  :=  £;[Zj(y,i:)Zr(y,x)], /or  J,  r  G  {0,1}.  ■   ,:  ■,        ..  /      .,'.,.       ... 

This  condition  holds  for  common  estimators  of  conditional  distribution  functions  (Beran, 
1977,  Burr  and  Doss,  1993,  and  Appendix  F).  These  estimators,  however,  might  produce 
estimates  that  are  not  monotonic  in  the  level  of  the  outcome  y  (Foresi  and  Peracchi,  1995, 
and  Hall,  Wolff,  and  Yao,  1999).  A  way  to  avoid  this  problem  and  to  improve  the  finite 
sample  properties  of  the  conditional  distribution  estimators  is  by  rearranging  the  estimates 
(Chernozhukov,  Fernandez- Val,  and  Galichon,  2006).  The  joint  convergence  holds  trivially 
if  the  samples  for  each  group  are  mutually  independent. 
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If  we  start  from  a  conditional  quantile  estimator  Qy-{u\x),  we  can  define  the  conditional 
distribution  function  estimator  FY^{y\x)  using  the  relation  (2.10).  It  turns  out  that  if 
the  original  quantile  estimator  satisfies  conditions  C  and  Q,  then  the  resulting  conditional 
distribution  estimator  satisfies  condition  D.  This  result  allows  us  to  give  a  unified  treatment 
of  the  policy  estimators  based  on  either  quantile  or  distribution  estimators. 

Lemma  1.  Under  conditions  C  and  Q,  the  estimators  of  the  conditional  distribution  func- 
tion defined  by  (2.10)  satisfy  the  condition  D  with 

Z,(y,a-)  = -/,.  (y|x)\/,(Fv-(y|x),x),  j  G  {0, 1}. 

3.2.  Examples  of  Conditional  Estimators.  Here  we  verify  that  the  principal  estima- 
tors of  conditional  distribution  and  quantile  functions  satisfy  the  functional  central  limit 
theorem,  which  we  required  to  hold  in  our  main  Conditions  D  and  Q.  In  this  section  we 
drop  the  dependence  on  the  group  index  to  simplify  the  notation. 

Example  1  continued.  Classical  regression.  Consider  the  classical  linear  regression 
model  y  =  X'fSo  +  V,  where  the  disturbance  V  is  independent  of  A'  and  has  mean  zero, 
finite  variance  and  quantile  function  Q'o(''^).  In  this  case,  we  can  estimate  /5o  by  mean 
regression  and  quantiles  of  V  by  the  empirical  quantile  function  of  the  residuals.  We 
show  in  Appendix  F  that  the  resulting  estimator  9{u)  =  (q(u),/J')'  of  9o{u)  —  (qo(u),/9q)' 
obeys  a  functional  central  limit  theorem  y/n{9{u)  —  Oq{u))  =>  Go(u.)""^Z(i7),  where  Z  is  a 
zero  mean  Gaussian  process  with  covariance  function  fl{u,  u)  specified  in  (F.6)  and  matrix 
Go{u)  :=  G'(ao('J,),  f3o,  u)'  specified  in  (F.5).  The  resulting  estimator,  Q}-{u\:!:)  =  q{u)+x  P, 
of  the  conditional  quantile  function  Q)-{u\x)  obeys  a  functional  central  limit  theorem, 

V^(^QY{y\x)-Qy{y\x)^^{l,x')Go{ur'Z{u)=:V{u.x), 

in  l°°{{0, 1)  X  A'),  where  V{u,  x)  is  a  zero  mean  Gaussian  process  with  covariance  function, 

i:v{u,x,u,x)  =  {l,x')Go{u)-^n{u,u)[Go(uy^]'{l,x')'. 

Example  2  continued.  Quantile  regression.  Consider  a  linear  quantile  regression 
model  where  Q>'(7i|.r)  =  x'Po{u).  In  Appendix  F  we  show  the  canonical  quantile  regression 
estimator  satisfies  a  functional  central  limit  theorem,  y/n.{/3{u)  —  /3o{u))  =>  Go{u)~^ Z{u), 
where  Z{u)  is  a  zero  mea.n  Gaussian  process  with  covariance  function  Q.{u,  ii)  =  {mm{u,  ii)  — 
u  ■  u}E[XX']  and  Go{u)  :=  G{do{u),u)  =  -E[fYiX'po{u)\X)XX'].  The  estimator  of  the 
conditional  quantile  function  also  obeys  a  functional  central  limit  theorem, 

v^  {Qy{u\x)  -  Qv-(u|.t))  =  v^  (.r'/3(u)  -  x'Mu))  ^  x'GQ{ur'Z{u)  :=  V{u,x), 
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in  ^°°((0, 1)  X  A"),  where  V{u,  x)  is  a  zero  mean  Gaussian  process  with  covariance  function 
given  by: 

Ev{v.,x,u,x)  =  x'Go{u)~^Q{u,u)Go{u)~^x. 

Example  3  continued.  Duration  regression.  Consider  the  transformation  model 
for  the  conditional  distribution  function  stated  in  equation  (2.9).  A  common  duration 
model  that  gives  rise  to  this  specification  is  the  proportional  hazard  model  of  Cox  (1972), 
where  the  conditional  hazard  rate  of  an  individual  with  covariate  vector  x  is  Ay(y|x)  = 
Ao(y)  exp(x'/?o),  /?o  is  a  p-vector  of  regression  coefhcients,  Aq  is  the  nonnegative  base- 
line hazard  rate  function,  and  y  E  y  =  [0,y]  for  some  maximum  duration  y.  Let 
•^o(y)  =  Jq  ^oiy)dy  denote  the  integrated  basehne  hazard  function.  Then  Fv-(y|x)  =  1  — 
exp{-  exp(.T'/3o-l-ln  Ao(y))},  delivering  the  transformation  model  (2.9)  with  i{y)  =  In  Ao(y) 
and  m{x)  =  x'Pq. 

In  order  to  discuss  estimation,  let  us  assume  i.i.d.  sampling  of  (F,,  Xi)  without  censoring. 
Then  Cox's  (1972)  partial  maximum  likelihood  estimator  of  Pq  takes  the  form 

/n  n 

J2^og  {My)  exp{x[P)/Y,My)^Mx'jP)}dN,{y), 

and  the  Breslow-Nelson-Aalen  estimator  of  Aq  takes  the  form 

/y  n  _l  " 

{j^JM^Mx'jd)}' d{^iv,(y)},    '^^.■■: 
j=i  1=1  .;.  ;;.■;.. 

where  N,{y)  :=  1{Y,  <  y}  and  J,{y)  :=  IfV,  >y},   y  G  3^;  see  Breslow  (1972,1974). 

Let  W  denote  a  standard  Brownian  motion  on  y  and  let  Z  denote  an  independent 
p-dimensional  standard  normal  vector.  Andersen  and  Gill  (1982)  show  that 

:..■■•,,   ,,'      v^(^-/3o,A(y)-Ao(y))^(E-i/2Z,iy(a(y))-6(y)'S-^/2^)         '       ,• 

in  W  X  i°°{y),  with  the  terms  a(y),  b{y),  and  S,  and  regularity  conditions  defined  in 
Andersen  and  GiU  (1982)  and  Burr  and  Doss  (1993).  Let  Fy{y\x)  =  1  -  exp{-  exp(x';5  + 
log  A(y))}  be  the  estimator  of  Fy  (y|x).  Since  Fy(y|x)  is  Hadamard-differentiable  in  (/3,  A), 
by  the  functional  delta  method  we  have  the  functional  central  limit  theorem 

n(Fr(y|x)-Fy(y|x))  ^  {1-Fy(y|x)}  {exp(x'/3o)l'V(a(y))  +  5(y,x)'S-'/2^}  =:  Z{y,x), 


in  i°°{y  X  A'),  where  b{y,x)  =  Ay(y|x)x  —  exp(x'/3o)&(y),  and  Z{y,x)  is  a  zero  mean 
Gaussian  process  with  covariance  function,  for  y  <  y, 

Ez(y,x,y,x)  =  {l-Fy(y|x)}{l-Fy(y|x)}{exp(.T'/3o)exp(.T'/3o)a(y)  +  6(y,.T)'S-i6(y,f)}. 
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In  Appendix  F  we  also  discuss  another  estimator  of  this  model. 

Example  4  continued.  Distribution  regression.  Consider  the  model  F>(y|a;)  = 
A(x'/?o(y))  for  the  conditional  distribution  function,  where  A  is  a  known  link  function, 
such  as  the  logistic  or  normal  distribution.  We  can  estimate  the  function  Poiy)  by  applying 
maximum  likelihood  to  the  indicator  variables  !{)'  <  y}  for  each  value  oiy  E  y  separately. 
In  Appendix  F,  we  prove  that  the  resulting  estimator  0{y)  of  Po{y)  obeys  a  functional 
central  limit  theorem  ,-~^    '  ._.' 

V^(d{y)-My))=>-Go{yr'Z{y), 

where  Go(y)  :=  G(/3o(y),y)  =  E[A[X%(y)]2XA7{A[A"/?o(y)](l  -  A[A'/?o(y)])}],  A  is  the 
derivative  of  A,  and  Z{y)  is  a  zero  mean  Gaussian  process  with  covariance  function 

•        n{y.,  y)  =  E  [XX'\[X'l3o{y)]X[X'i3om/{A[X'3o{ym  -  A[A'/3o(y)])}] , 

for  y  >  y.  Hence  the  resulting  estimator  FY{y\x)  :=  A(.T'/?(y))  of  the  conditional  distribu- 
tion function  also  obeys  the  functional  central  limit  theorem, 


v/n(Fy(y|; 


Fy{y\x)j  =>  -A[x'/?o(y)]x'Go(y)-'Z(y)  =:  Z{y,x), 
in  (l°°{y  X  X),  where  Z{y,x)  is  a  zero  mean  Gaussian  process  with  covariance  function: 


T.z{y,x.,y.x)  =  A[x'/3o(2/)]A[x'(?o(y)]x'Go(y)-^fi(y,y)Go(y)- 


'x. 


3.3.  Basic  principles  underlying  the  limit  theory.  The  derivation  of  the  limit  theory 
for  policy  estimators  relies  on  several  basic  principles  that  allow  us  to  link  the  properties 
of  the  estimators  of  conditional  (quantile  and  distribution)  functions  with  the  properties  of 
estimators  of  marginal  functions.  First,  although  there  does  not  exist  a  direct  connection 
between  conditional  and  marginal  quantiles,  we  can  always  switch  from  conditional  quan- 
tiles  to  conditional  distributions  using  Lemma  1,  then  use  the  law  of  iterated  expectations 
to  go  from  conditional  distribution  to  marginal  distribution,  and  finally  get  to  marginal 
quantiles  by  inverting.  Second,  as  the  functionals  of  interest  depend  on  the  entire  condi- 
tional function,  we  must  rely  on  the  functional  delta  method  to  obtain  the  limit  theory  for 
these  functionals  as  well  as  to  obtain  intermediate  limit  results  such  as  Lemma  1.  Since  the 
estimated  conditional  distributions  and  quantile  functions  are  usually  non-monotone  and 
discontinuous  in  finite  samples,  we  must  use  refined  forms  of  the  functional  delta  method. 

Accordingly,  the  key  ingredient  in  the  derivation  and  one  of  the  main  theoretical  con- 
tributions of  the  paper  is  the  demonstration  of  the  Hadamard  differentiability  of  the  func- 
tionals of  interest  with  respect  to  the  limit  of  the  conditional  processes,  tangentially  to  the 
subspace  of  continuous  functions.  Indeed,  we  need  this  refined  form  of  differentiability  to 
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deal  with  our  conditional  processes,  which  typically  are  discontinuous  random  functions  in 
finite  samples  yet  converge  to  continuous  random  functions  in  large  samples.  These  refined 
differentiabihty  results  in  turn  enable  us  to  use  the  functional  delta  method  to  derive  all 
of  the  following  limit  distribution  and  inference  theory. 

3.4.  Limit  theory  for  counterfactual  distribution  and  quantile  functions.   Our 

first  main  result  shows  that  the  estimators  of  the  marginal  distribution  and  quantile  func- 
tions before  and  after  the  policy  intervention  satisfy  a  functional  central  limit  theorem. 

Theorem  1  (Limit  distribution  for  marginal  distribution  functions).  Under  Conditions  M 
and  D,  the  estimators  Fy  (y)  of  the  marginal  distribution  functions  Fy  [y)  jointly  converge 
in  law  to  the  following  Gaussian  processes: 

V^,  (F^.{y)  -  F^-{y))  ^  ^J  Z,(y,x)dFx,{x)  =:  ^,Z^{y),    j,k  e  {0, 1},       (3.3) 

in  i°°{y),  where  y  h->  Z^[y),  j  e  {0,1},  have  zero  m.ean  and  covariance  function,  for 
j,fc,7-,5G  {0,  1}, 

^zjy-^y)  ■■=  E[Z^{y)Z^{y)]  =   f   [  EzJy,x,y,x)dFx,{x)dFxAi)-  (3.4) 

Jx  J  X 

Theorem  2  (Limit  distribution  for  marginal  quantile  functions).    Under  Conditions  M, 

C,  and  D  the  estimators  Qy  {u)  of  the  marginal  quantile  functions  Qy  {u)  jointly  converge 

in  law  to  the  following  Gaussian  processes: 

._y/^{Q\.^u)  -  Q'^y^iu))  =>  -Z^{Q'y^{u))/f,\{Q'y^{u))  =:  Vfiu),    j^ke  {0,  1},        (3.5) 

in  ^°°((0,  1)),  where  fy  [y]  —  J^  fY^{y\x)dFxi^{x),  and  u  i-^  Vj'iu),  j,  k  G  {0, 1},  have  zero 
mean  and  covariance  function,  for  j,  k,r,  s  G  {0,  1},         ■     •  '"  '  . 

E';;^^{u,u)  :=  E[V;'{u)V;[u)]  =  ^%^{Q'y^{u),Ql.^{u))/[f,\{Q'y^iu))f^^{Ql,{m 

Our  second  main  result  shows  that  the  estimators  of  the  marginal  quantile  and  distri- 
bution pohcy  effects  also  satisfy  a  functional  central  limit  theorem. 

Corollary  1  (Limit  distribution  for  quantile  policy  effects).  Under  Conditions  M,  C,  and 
D  the  estimators  of  the  quantile  policy  effects  converge  in  law  to  the  following  Gaussian 
processes:  ■'  - 


V^  [QEy^iu]  -  QE!^.{u)J  ^  V^\//('u)  -  V%K)°C")  =:  ^^jiu),   k,j  G  {0,  1}.       (3.6) 

in  the  space  <?°°((0,  1)),  where  the  processes  u  i— >  WHu),  j,  k  G  {0,  1},  have  zero  mean  and 
covariance  function  E^  ^(u,  u)  :=  E[Wj{u)W^{u)\,  for  j,  k,  r,  s  G  {0,  1}. 
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Corollary  2  (Limit  distribution  for  distribution  policy  effects).  Under  Conditions  M  and 
D  the  estimators  of  the  distribution  policy  effects  converge  inlaw  to  the  following  Gaussian 
processes: 

-'>■-,  1 


in 


V^  [DE,.{y)  -  DE'y^{y)J  =>  V^jZ^iv)  "  V^oZ°{y)  =:  5^(y),  j,  k  e  {0, 1},        (3.7) 

the  space  £°°{y),  where  the  processes  y  i— >  S^{y),  j.k  G   {0,1},  have  zero  mean  and 
variance  functionT.'^^^{y,y)  :=  E[Sj{y)S^{y)],  for  j,k,r,s  e  {0,1}. 

Our  third  main  result  shows  that  various  functionals  of  the  status  quo  and  counterfactual 
marginal  distribution  and  quantile  functions  satisfy  a  functional  central  limit  theorem. 

Corollary  3  (Limit  distribution  for  differentiable  functionals).  Let  Hyly)  = 
^(y,  Fyjj,  Fy^,  Fy^,  FyJ,  a  funcHonal  taking  values  in  l'°°{y),  he  Hadam.a,rd  differentiable 
in  {Fy^,  Fy^,  Fy^,  FyJ  taugentially  to  the  subspace  of  continuous  functions  with  derivative 
(000,  011,  001,  0io)-  Then  under  Conditions  M  and  D  the  plug-in  estimator  Hy{y)  defined 
in  (2.11)  converges  in  law  to  the  following  Gaussian  process: 

V^(//v(y)-//r(y))=>     E     y%^'jkiy^FlFl.,F,\,F^.JZ^iy)=:TH{y),       (3.8) 

.?A-e{o,i} 

in  i°°{y),  where  y  i— >  Tuiy)  has  zero  mean  and  covariance  function  Y^Tniy^y)  '■  — 
E[TH{y)Tnm- 

Examples  of  functionals  covered  by  Corollary  3  include  function-valued  parameters, 
such  as  Lorenz  curves  and  Lorenz  policy  effects,  as  well  as  scalar-valued  parameters,  such 
as  Gini  coefficients  and  Gini  pohcy  effects  (Barrett  and  Donald,  2009).  These  examples 
also  include  quantile  and  distribution  functions  of  the  effect  of  the  policy  defined  under 
Condition  RP;  in  Appendix  C  we  state  the  results  for  these  effects  separately  in  order  to 
give  them  some  emphasis. 

3.5.  Uniform  inference  and  resampling  methods.  We  can  readily  apply  the  preced- 
ing limit  distribution  results  to  perform  inference  on  the  distributions  and  quantiles  of  the 
outcome  before  and  after  the  policy  at  a  specific  point.  For  example,  Corollar}'  1  implies 
that  the  quantile  policy  effect  estimator  for  a  given  quantile  u  is  asymptotically  normal 
with  mean  QEy  [u)  and  variance  E^/  [u,u)/n.  We  can  therefore  perform  inference  on 
QEy  (u)  for  a  particular  quantile  index  u  using  this  normal  distribution  and  replacing 
E^.  (u,  u)  by  a  consistent  estimate. 
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However,  pointwise  inference  permits  looking  at  the  effect  of  the  pohcy  at  a  specific 
point  only.  This  approach  might  be  restrictive  for  policy  analysis  where  the  quantities  and 
hypotheses  of  interest  usually  involve  many  points  or  a  continuum  of  points.  That  is,  the 
entire  distribution  or  quantile  function  of  the  observed  and  counterfactual  outcomes  is  often 
of  interest.  For  example,  in  order  to  test  hypotheses  of  the  policy  having  no  effect  on  the 
distribution,  having  a  constant  effect  throughout  the  distribution,  or  having  a  first  order 
dominance  effect,  we  must  use  the  entire  outcome  distribution,  and  not  only  a  single  specific 
point.  Moreover,  simultaneous  inference  corrections  to  pointwise  procedures  based  on  the 
normal  distribution,  such  as  Bonferroni-type  corrections,  can  be  very  conservative  for 
simultaneous  testing  of  highly  dependent  hypotheses,  and  become  completely  inadequate 
for  testing  a  continuum  of  hypotheses. 

A  convenient  and  computationally  attractive  approach  for  performing  inference  on  func- 
tion-valued parameters  is  to  use  Kolmogorov-Smirnov  type  procedures.  Some  complica- 
tions arise  in  our  case  because  the  limit  processes  are  non-pivotal,  as  their  covariance 
functions  depend  on  unknown,  though  estimable,  nuisance  parameters.^  A  practical  and 
valid  way  to  deal  with  non-pivotality  is  to  use  resampling  and  related  simulation  meth- 
ods. An  attractive  feature  of  our  theoretical  analysis  is  that  validity  of  resampling  and 
simulation  methods  follows  from  the  Hadamard  differentiability  of  the  policy  functionals 
with  respect  to  the  underlying  conditional  functions.  Indeed,  given  that  bootstrap  and 
other  methods  can  consistently  estimate  the  limit  laws  of  the  estimators  of  the  conditional 
distribution  and  quantile  functions,  they  also  consistently  estimate  the  limit  laws  of  our 
policy  estimators.  This  convenient  result  follows  from  preservation  of  validity  of  bootstrap 
and  other  resampling  methods  for  estimating  laws  of  Hadamard  differentiable  functionals; 
see  more  on  this  in  Lemma  6  in  Appendix  A. 

Theorem  3  (Validity  of  bootstrap  and  other  simulation  methods  for  estimating  the  laws  of 
policy  estimators  of  function- valued  parameters).  //  the  bootstrap  or  any  other  simulation 
method  consistently  estimates  the  laws  of  the  limit  stochastic  processes  (3.1)  and  (3.2)  for 
the  estimators  of  the  conditional  quantile  or  distribution  function,  then  this  method  also 
consistently  estimates  the  laws  of  the  lim.it  stochastic  processes  (3.3),  (3.5),  (3.6),  (3.7), 
and  (3.8)  for  policy  estimators  of  marginal  distribution  and  quantile  functions  and  other 
functionals.  .  ■-  . 


Similar  non-pivotality  issues  arise  in  a  variety  of  goodnes.s-of-fit  problems  studied  by  Durbin  and  others, 
and  are  referred  to  as  the  Durbin  problem  by  Koenker  and  Xiao  (2002). 
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Theorem  3  shows  that  the  bootstrap  is  vahd  for  estimating  the  Umit  laws  of  various 
inferential  processes.  This  is  true  provided  that  the  bootstrap  is  valid  for  estimating  the 
limit  laws  of  the  (function-valued)  estimators  of  the  conditional  distribution  and  quantile 
functions.  This  is  a  reasonable  condition,  but,  to  the  best  of  our  knowledge,  there  are  no 
results  in  the  literature  that  verify  this  condition  for  our  principal  estimators.  Indeed,  the 
previous  results  on  the  bootstrap  established  its  validity  only  for  estimating  the  pointwise 
laws  of  our  principal  estimators,  which  is  not  sufficient  for  our  purposes.^  To  overcome  this 
difficulty,  in  Appendix  F  we  prove  validity  of  the  empirical  bootstrap  and  other  related 
methods,  such  as  Bayesian  bootstrap,  wild  bootstrap,  k  out  of  n  bootstrap,  and  subsam- 
pling  bootstrap,  for  estimating  the  laws  of  function-valued  estimators,  such  as  quantile 
regression  and  distribution  regression  processes.  These  results  may  be  of  substantial  inde- 
pendent interest.  ,    , 

We  can  then  use  Theorem  3  to  construct  the  usual  uniform  bands  and  perform  inference 
on  the  marginal  distribution  and  quantile  functions,  and  various  functionals,  as  described 
in  detail  in  Chernozhukov  and  Fernandez- Val  (2005)  and  Angrist,  Chernozhukov,  and 
Fernandez- Val  (2006).  Moreover,  if  the  sample  size  is  large,  we  can  reduce  the  computa- 
tional complexity  of  the  inference  procedure  by  resampling  the  first  order  approximation 
to  the  estimators  of  the  conditional  distribution  and  quantile  functions  (Chernozhukov 
and  Hansen,  2006);  by  using  subsampling  bootstrap  (Chernozhukov  and  Fernandez- Val, 
2005);  or  by  simulating  the  limit  processes  Zj  or  Vj,  j  G  {0,  1},  appearing  in  expressions 
(3.1)  and  (3.2),  using  multiplier  methods  (Barrett  and  Donald,  2003). 

3.6.  Incorporating  uncertainty  about  the  distribution  of  the  covariates.  In  the 

preceding  analysis  we  assumed  that  we  know  the  distributions  of  the  covariates  before  and 
after  the  policy  intervention  for  the  target  population.  In  practice,  however,  we  usually 
observe  such  distributions  only  for  individuals  in  the  sample.  If  the  individuals  in  the 
sample  are  the  target  population,  then  the  previous  limit  theory  is  valid  for  performing 
inference  without  any  adjustments.  If  a  more  general  population  group  is  the  target 
population,  then  the  distributions  of  the  covariates  need  to  be  estimated,  and  the  previous 
limit  theory  needs  to  be  adjusted  to  take  this  into  account.  Here  we  highlight  the  main 
ideas,  while  in  Appendix  D  we  present  formal  distribution  and  inference  theory. 

We  begin  by  assuming  that  the  estimators  x  \—f  Fyj. (x),  k  G  {0,  1},  of  the  covariate 
distribution  functions  are  well  behaved,  specifically  that  they  converge  jointly  in  law  to 


Exceptions  include  Chernozhukov  and  Hansen  (2006)  and  Chernozhukov  and  Fernandez- Val  (2005), 
but  they  looked  at  forms  of  subsampling  only. 
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Gaussian  processes  ^a'^,  k  G  {0, 1}: 

v^  (Fx,{x)  -  Fx,{x))  =>  ^,BxSx).    /c  e  {0,  1}, 


as  rigorously  defined  in  Appendix  D.l.  This  assumption  is  quite  general  and  holds  for 
conventional  estimators  such  as  the  empirical  distribution  under  i.i.d.  sampling  as  well  as 
various  modifications  of  conventional  estimators,  as  discussed  further  in  Appendix  D.  The 
joint  convergence  holds  trivially  in  the  leading  cases  where  the  distribution  in  group  1  is 
a  known  transformation  of  the  distribution  in  group  0,  or  when  the  two  distributions  are 
estimated  from  independent  samples. 

The  estimation  of  the  covariate  distributions  affects  limit  distributions  of  functionals  of 
interests.  Let  us  consider,  for  example,  the  marginal  distribution  functions.  When  the 
covariate  distributions  are  unknown,  the  plug-in  estimators  for  these  functions  take  the 
form  FyXv)  =  /^  FYj{y\x)dFxi,{x) ,  j,k  6  {0, 1}.  The  limit  processes  for  these  estimators 
become 

V^  (^F,'.{y)  -  F^/y))  =>  ^,Z][y)  +  ^,  f  Fy^{y\x)dBx,{x),    j,k  e  {0,  1}, 

where  the  familiar  first  component  arises  from  the  estimation  of  the  conditional  distribu- 
tion and  the  second  comes  from  the  estimation  of  the  distributions  of  the  covariates.  In 
Appendix  D  we  discuss  further  details.  .  ,  .       ,  ,.      , 

4.  Labor  Market  Institutions  and  the  Distribution  of  Wages 

The  empirical  application  in  this  section  draws  its  motivation  from  the  influential  article 
by  DiNardo,  Fortin,  and  Lemieux  (1996,  DFL  hereafter),  which  studied  the  effects  of  insti- 
tutional and  labor  market  factors  on  the  evolution  of  the  U.S.  wage  distribution  between 
1979  and  1988.  The  goal  of  our  empirical  application  is  to  complete  and  complement 
DFL's  analysis  by  using  a  wider  range  of  techniques,  including  quantile  regression  and 
distribution  regression,  and  to  provide  confidence  intervals  for  scalar-valued  effects  as  well 
as  function- valued  effects  of  the  institutional  and  labor  market  factors,  such  as  quantile, 
distribution,  and  Lorenz  policy  effects.  - 

We  use  the  same  dataset  as  in  DFL,  extracted  from  the  outgoing  rotation  groups  of  the 
Current  Population  Surveys  (CPS)  in  1979  and  1988.  The  outcome  variable  of  interest 
is  the  hourly  log- wage  in  1979  dollars.  The  regressors  include  a  union  status  dummy, 
nine  education  dummies  interacted  with  experience,  a  quartic  term  in  experience,  two 
occupation  dummies,  twenty  industry  dummies,  and  dummies  for  race,  SMSA,  marital 
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status,  and  part-time  status.  Following  DFL  we  weigh  the  observations  by  the  product 
of  the  CPS  sampling  weights  and  the  hours  worked.  We  analyze  the  data  for  men  and 
women  separately.  ■  ■ 

The  major  factors  suspected  to  have  an  important  role  in  the  evolution  of  the  wage 
distribution  between  1979  and  1988  are  the  minimum  wage,  whose  real  value  declined  by 
27  percent,  the  level  of  unionization,  whose  level  also  declined  from  30  percent  to  21  percent 
in  our  sample,  and  the  composition  of  the  labor  force,  whose  education  levels  and  other 
characteristics  have  also  changed  substantially  during  this  period.  Thus,  following  DFL, 
we  decompose  the  total  change  in  the  US  wage  distribution  into  the  sum  of  four  effects: 
(1)  the  effect  of  a  change  in  minimum  wage,  (2)  the  effect  of  de-unionization.  (3)  the  effect 
of  changes  in  the  composition  of  the  labor  force,  and  (4)  the  price  effect.  The  effect  (1) 
measures  changes  in  the  marginal  distribution  of  wages  that  occur  due  to  a  change  in  the 
minimum  wage;  the  effects  (2)  and  (3)  measure  changes  in  the  marginal  distribution  of 
wages  that  occur  due  to  a  change  in  the  distribution  of  a  particular  factor,  having  fixed 
the  distribution  of  other  factors  at  some  constant  level;  the  effect  (4)  measures  changes  in 
the  marginal  distribution  of  wages  that  occur  due  to  a  change  in  the  wage  structure,  or 
conditional  distribution  of  wages  given  worker  characteristics. 

Next  we  formally  define  these  four  effects  as  differences  between  appropriately  chosen 
counterfactual  distribution  functions.  Let  F^/'  "  denote  the  counterfactual  marginal  dis- 
tribution  function  of  log- wages  Y  when  the  wage  structure  is  as  in  year  t,  the  minimum 
wage,  ni,  is  as  the  level  observed  for  year  s,  the  distribution  of  union  status,  U.  is  as  the 
distribution  observed  in  year  r,  and  the  distribution  of  other  worker  characteristics,  Z,  is 
as  the  distribution  observed  in  year  v.  We  identify  and  estimate  such  counterfactual  dis- 
tributions using  the  procedures  described  below.  Given  these  counterfactual  distributions, 
we  can  decompose  the  observed  total  change  in  the  distribution  of  wages  between  1979 
and  1988  into  the  sum  of  four  effects: 


■^Yss,mss  V79, 77179  '      i'88."l88  Vgg, 77179!  '"   l^V'sg, 77179  Ysa.rmgi 

(1)  (2) 

_|_        [17^^79.288    77^/79,2791  I      rpUjs.Zjs    piljQ.Zjg-l 

'  l-'^y88,'n79  ■'^yss, 77779)  '      l-'^V88,in79  V79, 77779! 

(3)  (4) 


(4.1) 


The  first  component  is  the  effect  of  the  change  in  the  minimum  wage,  the  second  is  the 
effect  of  de-unionization,  the  third  is  the  effect  of  changes  in  worker  characteristics,  and 
the  fourth  is  the  price  effect.  As  stated  above,  we  see  that  the  effects  (2)  and  (3)  measure 
changes  in  the  marginal  distribution  of  wages  that  occur  due  to  a  change  in  the  distribution 
of  a  particular  factor,  having  fixed  the  distribution  of  other  factors  at  some  constant  level. 
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The  effect  (4)  captures  changes  in  the  wage  structure  or  conditional  distribution  of  wages 
given  observed  characteristics;  in  particular,  it  captures  the  effect  of  changes  in  the  market 
returns  to  workers'  characteristics,  including  education  and  experience.  Finally,  we  discuss 
the  interpretation  of  the  minimum  wage  effect  (1)  in  detail  below. 

The  decomposition  (4.1)  is  the  distribution  version  of  the  Oaxaca-Blinder  decomposition 
for  the  mean.  We  obtain  similar  decompositions  for  other  functionals  (f^iFy^^^l')  of  interest, 
such  as  marginal  quantiles  and  Lorenz  curves,  by  making  an  appropriate  substitution  in 
equation  (4.1)  : 

(1)  (2) 

(3)  (4) 

(4.2) 
In  constructing  the  decompositions  (4.1)  and  (4.2),  we  follow  the  same  sequential  order  as 
in  DFL.^  Also,  like  DEL,  we  follow  a  partial  equiUbrium  approach,  but,  unlike  DFL,  we 
do  not  incorporate  supply  and  demand  factors  in  our  analysis  because  they  do  not  fit  well 
in  our  framework.        ^  -"'■'' _^     ,    .  '' 

We  next  describe  how  to  identify  and  estimate  the  various  counterfactual  distributions 
appearing  in  (4.1).  The  first  counterfactual  distribution  we  need  is  Fy^^'^^^,  the  distri- 
bution of  wages  that  we  would  observe  in  1988  if  the  real  minimum  wage  were  as  high 
as  in  1979.  Identifying  this  quantity  requires  additional  assumptions.^"  Following  DFL, 
the  first  strategy  we  employ  is  to  assume  the  conditional  wage  density  at  or  below  the 
minimum  wage  depends  only  on  the  value  of  the  minimum  wage,  and  the  minimum  wage 
has  no  employment  effects  and  no  spillover  effects  on  wages  above  its  level.  The  second 
strategy  we  employ  completely  avoids  modeling  the  conditional  wage  distribution  below 
the  minimal  wage  by  simply  censoring  the  observed  wages  below  the  minimum  wage  to 
the  value  of  the  minimum  wage.  Under  the  first  strategy,  DFL  show  that 

I   ^r88,m88  {y\u,z),  -  if  y  >  myg; 


where  Fy^^msiyW^  ^)  denotes  the  conditional  distribution  of  wages  at  year  (  given  worker 
characteristics  when  the  level  of  the  minimum  wage  is  as  in  year  s.    Under  the  second 


The  choice  of  sequential  order  matters  and  can  affect  the  relative  importance  of  the  four  effects.  We 
report  some  results  for  the  reverse  sequential  order  in  the  Appendix. 

We  cannot  identify  this  quantity  from  random  variation  in  minimum  wage,  since  the  federal  minimum 
wage  does  not  vary  across  individuals  and  varies  little  across  states  in  the  years  considered. 


strategy,  we  have  that  '  '  ,,  ■  ■ 

T?  (  \       \      /  °'  if  y  <m79;  ' 

Given  either  (4.3)  or  (4.4)  we  identify  the  counterfactual  distribution  of  wages  using  the 
representation: 

.      i^SS(y)  =  /^Vss,m..(y|^,^)dFyz.s(«,2),  (4.5) 

where  Fyzi  is  the  joint  distribution  of  worker  characteristics  and  union  status  in  year  t. 
We  can  then  estimate  this  distribution  using  the  plug-in  principle.  In  particular,  we  esti- 
mate the  conditional  distribution  in  expressions  (4.3)  and  (4.4)  using  one  of  the  regression 
methods  described  below,  and  the  distribution  function  Fyzsg  using  its  empirical  analog. 

The  other  counterfactual  marginal  distributions  we  need  are 

;     '      F,Hi'^TAy)  =  I J  ^yss,^rAyW,z)dFu,,{u\z)dFzss{z)  (4.6) 

and 

Fy::S:M  =  j  FY,,,n.Ay\u.z)dFuz.Au.z).  (4.7) 

Given  either  of  our  assumptions  on  the  minimum  wage  all  the  components  of  these  distribu- 
tions are  identified  and  we  can  estimate  them  using  the  plug-in  principle.  In  particular,  we 
estimate  the  conditional  distribution  Fvgg,n,_g(y|i/,  z)  using  one  of  the  regression  methods 
described  below,  the  conditional  distribution  Fu^^{u\z),  u  G  {0,  1},  using  logistic  regression, 
and  Fzsg,{z)  and  Ft'z,9  using  the  empirical  distributions. 

Formulas  (4.5)-(4.7)  giving  the  expressions  for  the  counterfactual  distributions  reflect 
the  assumptions  that  give  the  counterfactual  distributions  a  formal  causal  interpretation. 
Indeed,  we  assume  in  (4.6)  and  (4.7)  that  we  can  fix  the  relevant  conditional  distributions 
and  change  only  the  marginal  distributions  of  the  relevant  covariates.  In  (4.5),  we  also 
specify  how  the  conditional  distribution  of  wages  changes  with  the  level  of  the  minimum 
wage.  Note  that  we  directly  observe  the  marginal  distributions  appearing  on  the  left  side 
of  the  decomposition  (4.1)  and  estimate  them  using  the  plug-in  principle. 

To  estimate  the  conditional  distributions  of  wages  we  consider  three  different  regression 
methods:  classical  regression,  linear  cjuantile  regression,  and  distribution  regression  with 
a  logit  link.  The  classical  regression,  despite  its  wide  use  in  the  literature,  is  not  appro- 
priate in  this  application  due  to  substantial  conditional  heteroscedasticity  in  log  wages 
(Lemieux,  2006,  and  Angrist,  Chernozhukov,  and  Fernandez- Val,  2006).  The  linear  quan- 
tile  regression  is  more  flexible,  but  it  also  has  shortcomings  in  this  application.    First, 
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there  is  a  considerable  amount  of  rounding,  especially  at  the  level  of  the  minimum  wage, 
which  makes  the  wage  variable  highly  discrete.  Second,  a  linear  model  for  the  conditional 
quantile  function  may  not  provide  a  good  approximation  to  the  conditional  quantiles  near 
the  minimum  wage,  where  the  conditional  quantile  function  may  be  highly  nonlinear.  The 
distribution  regression  approach  does  not  suffer  from  these  problems,  and  we  therefore 
employ  it  to  generate  the  main  empirical  results.  In  order  to  check  the  robustness  of 
our  empirical  results,  we  also  employ  the  censoring  approach  described  above.  We  set 
the  wages  below  the  minimum  wage  to  the  value  of  the  minimum  wage  and  then  apply 
censored  quantile  and  distribution  regressions  to  the  resulting  data.  In  what  follows,  we 
first  present  the  empirical  results  obtained  using  distribution  regression,  and  then  briefly 
compare  them  with  the  results  obtained  using  censored  quantile  regression  and  censored 
distribution  regression. 

We  present  our  empirical  results  in  Tables  1-3  and  Figures  1-9.  In  Figure  1,  we  compare 
the  empirical  distributions  of  wages  in  1979  and  1988.  In  Table  1,  we  report  the  estimation 
and  inference  results  for  the  decomposition  (4.2)  of  the  changes  in  various  measures  of  wage 
dispersion  between  1979  and  1988  estimated  using  distribution  regressions.-'^  Figures  2- 
7  refine  these  results  by  presenting  estimates  and  95%  simultaneous  confidence  intervals 
for  several  major  functional  of  interest,  including  the  effects  on  entire  quantile  functions, 
distribution  functions,  and  Lorenz  curves.  We  construct  the  simultaneous  confidence  bands 
using  100  bootstrap  replications  and  a  grid  of  quantile  indices  {0.02,  0.021, ...,  0.98}.  We 
plot  all  of  these  function- valued  effects  against  the  quantile  indices  of  wages.  In  Tables  2-3 
and  Figures  8-9,  we  present  the  estimates  of  the  same  effects  as  in  Table  1  and  Figures 
2-3  estimated  using  various  alternative  methods,  such  as  censored  quantile  regression  and 
censored  distribution  regression.  Overall,  we  find  that  our  estimates,  confidence  intervals, 
and  robustness  checks  all  reinforce  the  findings  of  DFL,  giving  them  a  rigorous  econometric 
foundation.  Indeed,  we  provide  standard  errors  and  confidence  intervals,  without  which 
we  would  not  be  able  to  assess  the  statistical  significance  of  the  results.  Moreover,  we 
validate  the  results  with  a  wide  array  of  estimation  methods.  In  what  follows  below,  we 
discuss  each  of  our  results  in  more  detail. 

In  Figure  1,  we  present  estimates  and  uniform  confidence  intervals  for  the  marginal 
distributions  of  wages  in  1979  and  1988.  We  see  that  the  low  end  of  the  distribution  is 
significantly  lower  in  1988  while  the  upper  end  is  significantly  higher  in  1988.  This  pattern 


The  estimation  results  parallel  the  results  presented  in  DFL.  Table  Al  in  the  Appendix  gives  the 
results  for  the  decomposition  in  reverse  order.  .     , 
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reflects  the  well-known  increase  in  wage  inequality  during  this  period.  Next  we  turn  to  the 
decomposition  of  the  total  change  into  the  sum  of  the  four  eff'octs.  For  this  decomposition 
we  focus  mostly  on  quantile  functions  for  comparability  with  recent  studies  and  to  facilitate 
the  interpretation.  In  Figures  2-3,  we  present  estimates  and  uniform  confidence  intervals 
for  the  total  change  in  the  marginal  quantile  function  of  wages  and  the  four  effects  that 
form  a  decomposition  of  this  total  change.-'^  We  report  the  marginal  quantile  functions  in 
1979  and  1988  in  the  top  left  panels  of  Figures  2  and  3.  In  Figures  4-7,  we  plot  analogous 
results  for  the  decomposition  of  the  total  change  in  marginal  distribution  functions  and 
Lorenz  curves.  ... 

From  Figures  2  and  3,  we  see  that  the  contribution  of  union  status  to  the  total  change  is 
quantitatively  small  and  has  a  U-shaped  effect  across  the  quantile  function  for  men.  The 
magnitude  and  shape  of  this  effect  on  the  marginal  quantiles  between  the  first  and  last 
decile  sharply  contrast  with  the  quantitatively  large  and  monotonically  decreasing  shape  of 
the  effect  of  the  union  status  on  the  conditional  quantile  function  for  this  range  of  indexes 
(Chamberlain,  1994),  and  illustrates  the  difference  between  conditional  and  unconditional 
effects. -"^  In  general,  interpreting  the  unconditional  eff'ect  of  changes  in  the  distribution  of 
a  covariate  requires  some  care,  because  the  covariate  may  change  only  over  certain  parts 
of  its  support.  For  example,  de-unionization  cannot  affect  those  who  were  not  unionized 
at  the  beginning  of  the  period,  which  is  70  percent  of  the  workers;  and  in  our  data,  the 
unionization  declines  from  30  to  21  percent,  thus  affecting  only  9  percent  of  the  workers. 
Thus,  even  though  the  conditional  impact  of  switching  from  union  to  non-union  status  can 
be  quantitatively  large,  it  has  a  quantitatively  small  effect  on  the  marginal  distribution 
since  only  9  percent  of  the  workers  are  affected. 

From  Figures  2  and  3,  we  also  see  that  the  change  in  the  distribution  of  worker  char- 
acteristics (other  than  union  status)  is  responsible  for  a  large  part  of  the  increase  in  wage 
inequality  in  the  upper  tail  of  the  distribution.  The  importance  of  these  composition  effects 
has  been  recently  stressed  by  Leniieux  (2006)  and  Autor,  Katz  and  Kearney  (2008).  The 
composition  effect  is  realized  through  at  least  two  channels.  The  first  channel  operates 
through  between-group  inequality.    In  our  case,  higher  educated  and  more  experienced 


Discreteness  of  wage  data  implies  that  the  quantile  functions  have  jumps.  To  avoid  this  erratic 
behavior  in  the  graphical  representations  of  the  results,  we  display  smoothed  quantile  functions.  The  non- 
smoothed  results  are  available  from  the  authors.  The  quantile  functions  were  smoothed  using  a  bandwidth 
of  0.015  and  a  Gaussian  kernel.  The  results  in  Tables  1-3  and  Al  have  not  been  smoothed. 

We  find  similar  estimates  to  Chamberlain  (1994)  for  the  effect  of  union  on  the  conditional  quantile 
function  in  our  CPS  data. 
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workers  earn  higher  wages.  By  increasing  their  proportion,  we  induce  a  larger  gap  be- 
tween the  lower  and  upper  tails  of  the  marginal  wage  distribution.  The  second  channel 
is  that  within-group  inequality  varies  by  group,  so  increasing  the  proportion  of  high  vari- 
ance groups  increases  the  dispersion  in  the  marginal  distribution  of  wages.  In  our  case, 
higher  educated  and  more  experienced  workers  exhibit  higher  within-group  inequality.  By 
increasing  their  proportion,  we  induce  a  higher  inequality  within  the  upper  tail  of  the 
distribution.  To  understand  the  effect  of  these  channels  in  wage  dispersion  it  is  useful  to 
consider  a  linear  quantile  model  Y  =  X'P{U),  where  X  is  independent  of  U.  By  the  law 
of  total  variance,  we  can  decompose  the  variance  of  Y  into: 

Var[Y]  =  E{P{U)]'Var[X]E[P{U)]  +  trace{E[XX']Var[p{U)]}.  (4.8) 

The  first  channel  corresponds  to  changes  in  the  first  term  of  (4.8)  where  Kar[X]  represents 
the  heterogeneity  of  the  labor  force  (between  group  inequality);  whereas  the  second  channel 
corresponds  to  changes  in  the  second  term  of  (4.8)  operating  through  the  interaction  of 
between  group  inequality  E[XX']  and  within  group  inequality  Var[P{U)]. 

In  Figures  2  and  3,  we  also  include  estimates  of  the  price  effect.  This  effect  captures 
changes  in  the  conditional  wage  structure.  It  represents  the  difference  we  would  observe 
if  the  distribution  of  worker  characteristics  and  union  status,  and  the  minimum  wage 
remained  unchanged  during  this  period.  This  effect  has  a  U-shaped  pattern,  which  is 
similar  to  the  pattern  Autor,  Katz  and  Kearney  (2006a)  find  for  the  period  between  1990 
and  2000.  They  relate  this  pattern  to  a  bi-polarization  of  employment  into  low  and  high 
skill  jobs.  However,  they  do  not  find  a  U-shaped  pattern  for  the  period  between  1980  and 
1990.  A  possible  explanation  for  the  apparent  absence  of  this  pattern  in  their  analysis 
might  be  that  the  declining  minimum  wage  masks  this  phenomenon.  In  our  analysis,  once 
we  control  for  this  temporary  factor,  we  do  uncover  the  U-shaped  pattern  for  the  price 
component  in  the  80s. 

In  Tables  2-3  and  Figures  8-9,  we  present  several  interesting  robustness  checks.  As  we 
mentioned  above,  the  assumptions  about  the  minimum  wage  are  particularly  delicate,  since 
the  mechanism  that  generates  wages  strictly  below  this  level  is  not  clear;  it  could  be  mea- 
surement error,  non-coverage,  or  non-compliance  with  the  law.  To  check  the  robustness  of 
the  results  to  the  DFL  assumptions  about  the  minimum  wage  and  to  our  semi-parametric 
model  of  the  conditional  distribution,  we  re-estimate  the  decomposition  using  censored 
linear  quantile  regression  and  censored  distribution  regression  with  a  logit  link,  using  the 
wage  data  censored  below  the  minimum  wage.    For  censored  quantile  regression,  we  use 
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Powell's  (1986)  censored  quantile  regression  estimated  using  Chernozhukov  and  Hong's 
(2002)  algorithm.  For  censored  distribution  regression,  we  simply  censor  to  zero  the  distri- 
bution regression  estimates  of  the  conditional  distributions  below  the  minimum  wage  and 
recompute  the  functionals  of  interest.  Overall,  we  find  the  results  are  very  similar  for  the 
quantile  and  distribution  regressions,  and  they  are  not  very  sensitive  to  the  censoring.  ■'"^ 

^  _     5.  Conclusion 

This  paper  develops  methods  for  performing  inference  about  the  effect  on  an  outcome  of 
interest  of  a  change  in  either  the  distribution  of  policy-related  variables  or  the  relationship 
of  the  outcome  with  these  variables.  The  validity  of  the  proposed  inference  procedures 
in  large  samples  relies  only  on  the  applicability  of  a  functional  central  limit  theorem  for 
the  estimator  of  the  conditional  distribution  or  conditional  quantile  function.  This  condi- 
tion holds  for  most  important  semiparametric  estimators  of  conditional  distribution  and 
cjuantile  functions,  such  as  classical,  quantile,  duration,  and  distribution  regressions. 
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Appendix 

This  Appendix  contains  proofs  and  additional  results.  Section  A  collects  preliminary 
lemmas  on  the  functional  delta  method  and  derives  the  functional  delta  method  for  any 
simulation  method,  extending  its  applicability  beyond  the  bootstrap.  Section  B  collects 
the  proofs  for  the  results  in  the  main  text  of  the  paper.  Section  C  gives  limit  distribution 
theory  for  policy  effects  estimators.  Section  D  presents  additional  results  for  the  case 
where  the  covariate  distributions  are  estimated.  These  results  complement  the  results  in 
the  main  text.  Section  E  derives  limit  theory,  including  Hadamard  differentiability,  for 
Z-processes  and  Section  F  applies  this  theory  to  the  principal  estimators  of  conditional 
distribution  and  quantile  functions.   These  results  establish  the  validity  of  bootstrap  and 


We  have  additional  results  on  quantile,  distribution  and  Lorenz  effects  for  the  censored  estimates; 
these  are  available  on  request  from  the  authors.  We  do  not  report  them  here  to  save  space. 
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other  resampling  schemes  for  the  entire  quantile  regression  process,  the  entire  distribution 
regression  process,  and  related  processes  arising  in  the  estimation  of  various  conditional 
quantile  and  distribution  functions.  These  results  may  be  of  a  substantial  independent 
interest. 

Appendix  A.  Functional  Delta  Method,  Bootstrap,  and  Other  Methods 

This  section  collects  preliminary  lemmas  on  the  functional  delta  method  and  derives  the 
functional  delta  method  for  any  simulation  method,  extending  its  applicability  beyond  the 
bootstrap. 

A.l.  Some  definitions  and  auxiliary  results.  We  begin  by  quickly  recalling  from  van 
der  Vaart  and  Wellner  (1996)  the  details  of  the  functional  delta  method. 

-Definition  1  (Hadamard-differentiability).  Let  Dq,  IP;  o,nd  E  he  normed  spaces,  with 
Do  C  P.  A  map  4>  :  V^p  cH)  ^-^  E  is  called  Hadamard-differentiable  at  6  ^B>^  tangentially 
to  Do  if  there  is  a  continuous  linear  map  4>'b  :  Do  '"^  E  such  that  .■     , 


tn 


(t>'g{h),      n  — >  oo, 


for  all  sequences  t„  ^-  0  and  h^  ^i  h  ^  Dq  such  that  6  +  tnh„  G  D^  for  every  n.     ,     ,,  , 

This  notion  works  well  together  with  the  continuous  mapping  theorem. 

Lemma  2  (Extended  continuous  mapping  theorem).  Let  D„  C  D  6e  arbitrary  subsets 
and  Qn  :  D„  H^  E  he  arbitrary  maps  (n  >  0),  such  that  for  every  sequence  a;„  6  ©„  : 
if  Xn'  ^  X  G  Do  along  a  subsequence,  then  p„'(x„')  —>  goi^)-  Then,  for  arbitrary  maps 
Xn  :  r^n  H^-  D„  and  every  random  element  X  with  values  in  Dq  such  that  go{X)  is  a  random 
element  inK:  ,  ,  ,   .  -,  ,  .,;..■  . , ,  ,,,,.,.    .;,:, 

.    (i)  IfXr.^X,thengn{Xr,)^go{X);      ■.:.:>."      ^.   ,     v. ,.     :  ,  .:,..■,:■. 

(ii)  IfXn-^pX,  then  gn{Xn)  ~^pgo{X).  ::'   ::   , ■   •       :  :     -   ■■    ;  ■    ■.,,.'■■.'' 

The  combination  of  the  previous  definition  and  lemma  is  known  as  the  functional  delta 
method. 

Lemma  3  (Functional  delta-method).  Let  Do,  D,  and  E  be  normed  spaces.  Let  (/>  :  D^  C 
D  I— >  E  6e  Hadamard-differentiable  at  0  tangentially  to  Dq.  Let  Xn  :  Q„  i— >  D,^  be  maps  with 
rn{Xn  —  9)  =^  X  inlD,  where  X  is  separable  and  takes  its  values  in  Do,  for  some  sequence 
of  constants  rn  — >  oo.    Then  rn  {4>{Xn)  —  (p{9))  =>  0e(A').   If  (p'g  is  defined  and  continuous 
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on  the  whole  o/P,  then  the  sequence  r„  {(p{Xn)  —  4>[9))  —  (!>'$  {fn{^n  —  d))  converges  to  zero 
in  outer  probability. 

The  applicability  of  the  method  is  greatly  enhanced  by  the  fact  that  Hadamard  differ- 
entiation obeys  the  chain  rule. 

Lemma  4  (Chain  rule).  If  0  :  B>^  C  H  i-^  E^  is  Hadamard-dijferentiable  at  9  £  B)^ 
tangentially  to  Do  and  •i/)  :  E^,  i— >  F  is  Hadamard- differentiable  at  (p{6)  tangentially  to 
4)' (Do),  then  V'  o  </!)  :  D^  1-^  F  is  Hadamard- differentiable  at  9  tangentially  to  Dq  with 
deiivative  xp'.,gs  o  (p'g.  '  ■  ,  •-  ■     .     .         . 

Another  technical  result  to  be  used  in  the  sequel  is  concerns  the  equivalence  of  continuous 
and  uniform  convergence. 

Lemma  5  (Uniform  convergence  via  continuous  convergence).  Let  D  and  E  be  complete 
separable  metric  spaces,  with  D  compact.  Suppose  /  :  D  h- >  E  zs  continuous.  Then  a 
sequence  of  functions  /„  :  D  ^— >  E  converges  to  f  uniformly  on  D  if  and  only  if  for  any 
convergent  sequence  Xn  — *  x  in  D  we  have  that  fn{xn)  — ^  /(2.'). 

Proof  of  Lemmas  2-4:  See  van  der  Vaart  and  Wellner  (1996)  Chap.  1.11  and  3.9.  D 
Proof  of  Lemma  5:  See,  for  example,  Resnick  (1987),  page  2.        ■  D 

A. 2.  Functional  delta-method  for  bootstrap  and  other  simulation  methods.   Let 

JF^  =  (IFi, ...,  Wn)  denote  the  data.  Consider  sequences  of  random  elements  Vr,  =  Vn{J-'n), 
the  original  empirical  process.  In  a  normed  space  D,  the  sequence  \/n.{Vn  —  V)  converges 
unconditionally  to  the  process  G.  Let  the  sequence  of  random  elements 

K  =  Vn  +  GjV^  (A.l) 

where  m  =  m{n)  is  a  possibly  random  sequence  such  that  m/niQ  ^p  1  for  some  sequence 
of  constants  mo  —>  00  such  that  mo/n  -^  c  >  0,^^  and  the  "draw"  Gn  is  produced  by 
bootstrap,  simulation,  or  any  other  consistent  method  that  guarantees  that  the  sequence 
Gn  converges  conditionally  given  .F„  in  distribution  to  a  tight  random  element  G, 

sup,,eBL,(0)  \E\:f,MGr.r  -  Eh{G)\  ->  0,  (A.2) 

in  outer  probability,  where  BLi(D)  denotes  the  space  of  function  with  Lipschitz  norm  at 
most  1  and  E\jr^  denotes  the  conditional  expectation  given  the  data.  In  the  definition,  we 
can  take  G  to  be  independent  of  ^„. 


The  random  scaling  is  needed  to  cover  wild  bootstrap,  for  example. 
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Given  a  map  0  :  D^  C  D  i— >  E,  we  wish  to  show  that 

SUP/jgBLi(E) 

in  outer  probabihty. 


Ei^nh  (%/^(0(K)  -  0(K^)))*  -  Eh{(j>'y{G))\  ^  0,  (A.3) 


Lemma  6  (Delta-method  for  bootstrap  and  other  simulation  methods).  Let  Dq;  IP,  o-Tid 
E  be  normed  spaces,  with  Pq  C  D.  Let  0  :  D^  C  D  ^->  E  6e  Hadamard-differentiable  at  V 
tangentially  to  Bq.  Let  V„  and  K„  be  maps  as  indicated  previously  with  values  in  D^  such 
that  \/n(Ki  —  V)  =>  G  and  f-4.S^  holds  in  outer  probability,  where  G  is  separable  and  takes 
its  values  in  Do-   Then  (A.3)  holds  in  outer  probability. 

Proof  of  Leraima  6:  The  proof  generalizes  the  functional  delta-method  for  empirical 
bootstrap  in  Theorem  3.9.11  of  van  der  Vaart  and  Wellner  (1996)  to  exchangeable  boot- 
strap. This  expands  the  applicability  of  delta-method  to  a  wide  variety  of  resampling  and 
simulation  schemes  that  are  special  cases  of  exchangeable  bootstrap,  including  empirical 
bootstrap,  Bayesian  bootstrap,  wild  bootstrap,  k  out  of  n  bootstrap,  and  subsampling 
bootstrap  (see  next  section  for  details). 

Without  loss  of  generality,  assume  that  the  derivative  ^'^/  :  D  i— *•  E  is  defined  and  contin- 
uous on  the  whole  space.  Otherwise,  replace  E  by  its  second  dual  E*"*  and  the  derivative 
by  an  extension  4>'y  :  D  h->  E**.  For  every  h  G  BLi(E),  the  function  h  o  (p'y  is  contained  in 
BL||<^.jl(D).  Thus  (A.2)  implies  sup^gBL,(E)  \EirJi{(p'v{Gn)T  -  Eh{(P'y{G))\  ->  0,  in  outer 
probability.  Next 


SUP/ieBLi(E) 


%>  (\/^  {HVu)  -  HVn))y  -  E\^Ji{cP'y  (G„))), 

m  (<^(t4)   -  4>{Vn))    -  <P'v   {V^iVn  -  Vn))  f   >  s) 


(A.4) 


The  theorem  is  proved  once  it  has  been  shown  that  the  conditional  probability  on  the  right 
converges  to  zero  in  outer  probability. 

Both  sequences  y/rn{V„  -  V)  and  G„  =  \/rn{Vn  -  V)  converge  (unconditionally)  in 
distribution  to  separable  random  elements  that  concentrate  on  the  space  Do-  The  first 
sequence  converges  by  assumption  and  Slut'sky's  theorem  when  m/m.Q  — >p  1  and  niQ/n  -^ 
c  >  0  and  converges  to  zero  when  rrio/n  — >  0  by  assumption  and  Slutslcy's  theorem.  The 
second  sequence  converges,  by  noting  that 


V^l{Vn  -V)  =  V^l{Vn  -  Vn)  +  V^{Vn  -  V) 
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and  that  E\E\r„h{^{Vn  -  Ki)*  +  t„)  -  £^|^XG  +  i„)|  <  sup^gBLi(D„)  E\Eij:„h{y/m.(V„  - 
Vn)Y  -  ^i^XG)!  =  sup;jgBLi(D„)  E\E\^^h[{Gny  -  E\^^h{G)\  which  converges  to  zero  by 
(A. 2),  and  by  E|;r,^/i(G)  =  Eh{G)  due  to  independence  of  G  from  JF„. 

By  Lemma  3, 

V^  (0(Vn)  -  <^(^0)    =  0K   (^/^(K^  -  ^O)    +  Op(l).  .      .  ,^ 

V^(0(K„)  -  (i6(y))  =  0V  (v^(K,  -  V))  +  o*p(l). 


Subtract  these  equations  to  conclude  that  the  sequence  y/Tn{(l){Vn)  —  (l){Vn))  —  (f^'viV^i^n  — 
Vn))  converges  unconditionally  to  zero  in  outer  probability.  Thus,  the  conditional  proba- 
bility on  the  right  in  (A. 4)  converges  to  zero  in  outer  mean.  D 

A. 3.  Exchangeable  Bootstrap.  Let  {\Vi,  ...,Wn)  denote  the  i.i.d.  data.  Next  we  define 
the  collection  of  exchangeable  bootstrap  methods  that  we  can  employ  for  inference.  For 
each  71,  let  (e„i, ....  e„„)  be  an  exchangeable,  nonnegative  random  vector.  Exchangeable 
bootstrap  uses  the  components  of  this  vector  as  random  sampling  weights  in  place  of 
constant  weights  (1, ...,  1).  A  simple  way  to  think  of  exchangeable  bootstrap  is  as  samphng 
each  variable  Wi  the  number  of  times  equal  to  e,i,',  albeit  without  requiring  e„,  to  be  integer- 
valued.  Given  an  empirical  process  Ki(/)  =  -  X2,'Li /(A',),  we  define  an  exchangeable 
bootstrap  draw  of  this  process  as 

where  e^  =  XliLi  ^ml'n-  This  insures  that  each  draw  of  V„  assigns  nonnegative  weights  to 
each  observation,  which  is  important  in  applications  of  bootstrap  to  extremum  estimators 
to  preserve  con\-exity  of  criterion  functions.  We  assume  that,  for  some  c  >  0 

n 

sup£;[e2+^]  <  cx),    n"^  J](e„, -e„)' ^P  1,    e^  ^p  c  >  0,  (A.6) 

■,=1 

where  the  first  two  conditions  are  standard,  see  Van  der  Vaart  and  Wellner  (1996),  and 
the  last  one  is  needed  to  apply  the  previous  lemma.  Let  us  consider  the  following  special 
cases:  (1)  The  standard  empirical  bootstrap  corresponds  to  the  case  where  (e„i, ...,  £„„) 
is  a  multinomial  vector  with  parameters  n  and  probabilities  (1/n,  ...,  1/n),  so  that  £„  =  1 
and  m  =  n.  (2)  The  Bayesian  bootstrap  corresponds  to  the  case  where  Ui, ...,  Un  are  i.i.d. 
nonnegative  random  variables,  e.g.  unit  exponential,  with  E\U{'^^]  <  co  for  some  £"  >  0, 
and  e„j  =  Ui/Un,  so  that  e„  =  1  and  m  =  n.  (3)  The  wild  bootstrap  corresponds  to  the 
case  where  e^i,  ...,  e„„  are  i.i.d.  vectors  with  £'[e^,|'^]  <  oo  for  some  t  >  0,  and  Karfeni]  =  1, 
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SO  that  m/n  =  e^  — >p  Ee\-^  >  0  and  mo  =  nEe^^^  -^  oo.  (4)  The  k  out  of  n  bootstrap 
resamples  k  <  n  observations  from  Wi,...,Wn  with  replacement.  This  corresponds  to 
letting  (e„i, ...,  e^n)  be  equal  to  y/n/k  times  multinomial  vectors  with  parameters  k  and 
probabilities  (1/n, ...,  1/n).  The  condition  (A. 6)  on  the  weights  holds  if  /c  ^  oo,  so  that 
e^  =  k/n  —^c>0  and  m  =  k  —>  oo.  (5)  The  subsampling  bootstrap  corresponds  to 
resampling  k  <  n  observations  from  Wi, ....  Wn  without  replacement.  This  corresponds  to 
letting  (e„i,  ...,enn)  be  a  row  of  k  times  the  number  n{n  —  k)~^^'^k~^/^  and  n  —  k  times 
the  number  0,  ordered  at  random,  independent  of  the  Wj's.  The  condition  (A. 6)  on  the 
weights  holds  if  both  A;  — >  oo  and  n  —  k  -^  oo.  In  this  case  e^  =  k/{n  ~  k)  -^  c  >  Q  and 
m  =  nk/{n  —  k)  -^  oo. 

As  a  consequence  of  Lemma  6,  we  obtain  the  following  result,  which  might  be  of  inde- 
pendent interest. 

Lemma  7  (Functional  delta  method  for  exchangeable  bootstrap).  The  exchangeable  boot- 
strap method  described  above  satisfies  condition.  (A. 2),  and  therefore  the  conclusions  of 
Lemma  6  about  validity  of  the  functional  delta  method  apply  to  this  method. 

Proof  of  Lemma  7:  By  Lemma  6,  we  only  need  to  verify  condition  (A. 2),  which  follows 
by  Theorem  3.6.13  of  Van  der  Vaart  and  Wellner  (1996).       '  D 

Appendix  B,  Inference  Theory  for  Counterfactual  Estimators  (Proofs) 
This  section  collects  the  proofs  for  the  results  in  the  main  text  of  the  paper. 

B.l.  Notation.  Define  K,  :=  Qy{U\x),  where  U  ~  Uniform(ZY)  with  U  =  (0, 1).  Denote 
by  y^  the  support  of  Vi,  yA!  :=  {(y,  x)  :  y  E  yx,  x  e  ^},  and  UX  :=  U  x  X .  We  assume 
throughout  that  y^  C  [V,  which  is  a  compact  subset  of  M,  and  that  x  €  A",  a  compact  subset 
of  R''.  In  what  follows,  £°^(^/<Y)  denotes  the  set  of  bounded  and  measurable  functions 
h  :  UX  I— >  M,  and  CiJAX)  denotes  the  set  of  continuous  functions  mapping  h  :  UX  i— >  M. 

B.2.  Uniform  Hadamard  differentiability  of  conditional  distribution  functions 
with  respect  to  the  conditional  quantile  functions.  The  following  lemma  establishes 
the  Hadamard  differentiabihty  of  the  conditional  distribution  function  with  respect  to  the 
conditional  quantile  function.  We  use  this  result  to  prove  Lemma  1  in  the  main  text  and 
to  derive  the  limit  distribution  for  the  policy  estimators  based  on  conditional  quantile 
models.  We  drop  the  dependence  on  the  group  index  to  simplify  the  notation. 
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Lemma  8  (Hadamard  derivative  of  Fy-(y|a:)  with  respect  to  Qy{u\x)).  Define  Fy(y|x,  /i,) 
;=  J   1{Qy{u\x)  +  i/i((u|x)  <  y}du.   Under  condition  C,  as  L  \  0, 

D,MxJ)  =  ^Hy|.^,/^.0^-^y(y|^-)  _^  ^^^y|^)  .^  -fyiy\x)hiFy{y\x)\x). 

The  convergence  holds  uniformly  m  any  compact  subset  ofyX  :=  {{y.x)  :  y  ^  y^^,  x  G  P(] , 
for  every  \\ht  -  /i||oo  -^  0,  where  h,,  G  i°°  {UX),  and  h  G  C{UX). 

Proof  of  Lemma  8:    We  have  that  for  any  J  >  0,  there  exists  e  >  0  such  that  for 
u  G  5e(Fy(y|x))  and  for  small  enough  t  >  0 

1{Qy{u\x)  +  th,{u\x)  <y]<  1{Qy{u\x)  +  t{h{FY{y\x)\x)  -  5)  <  y}; 
whereas  for  all  u  ^  B(^[Fy{y\x)), 

\{Qy{u\x)  +  tht[u\x)  <y]  =  1{Qy{u\x)  <y]. 
Therefore,  for  small  enough  f  >  0 

j^  l{gv'(u|.r)  +  th,{u\x)  <  y}du  -  Jo  1{Qy{u\x)  <  y}du 


t 
^      r  l{Qy{u\x)  +  t{h{FYiy\x)\x)  -  5)  <  y}  -  l{Qr(i/|.x)  <  y} ^^ 

JB,iFy{y\x))  ' 

which  by  the  change  of  variable  y  =  Q)-{u\x)  is  equal  to  • 

fY{y\x)d:y, 


(B.l) 


i   ■JJn[y,y-t{h(FY(y\:r)\:r)-5)] 


where  J  is  the  image  of  B£(Fy(y|x))  under  u  t-^  Q)-{-\x).  The  change  of  variable  is  possible 
because  Qy{-\^)  is  one-to-one  between  Bf:{F)-{y\x))  and  J. 

Fixing  e  >  0,  for  I  \  0,  we  have  that  J  D  \y,y  -  i {h{FY{y\x)\x)  -  5)]  =  [y,y  - 
f(/i(Fy(y|.r)|x)  -  (5)],  and  /y(y|x)  -^  /)'(y|x)  as  Fy(y|x)  -^  Fy(y|a;).  Therefore,  the  right 
hand  term  in  (B.l)  is  no  greater  than 

-fy{y\x){h{FY{y\x)\x)-5)  +  o{l). 

Similarly  — /y(y|a;)  {h{Fy{y\x)\x)  +  S)  +  o(l)  bounds  (B.l)  from  below.   Since  6  >  0  can 
be  made  arbitrarily  small,  the  result  follows. 

To  show  that  the  result  holds  uniformly  in  (y,  x)  E  K,  a  compact  subset  of  yX,  we  use 
Lemma  5.  Take  a  sequence  of  (y^,  Xt)  in  K  that  converges  to  (y,  x)  G  A',  then  the  preceding 
argument  apphes  to  this  sequence,  since  the  function  {y,x)  >-^  — /y(y|.T)/7.(Fy(y|.7:)|.r)  is 
uniformly  continuous  on  K.  This  result  follows  by  the  assumed  continuity  of  li{u\x), 
Fy(y|x)  and  /y(y|x)  in  both  arguments,  and  the  compactness  of  K.  D 
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B.3.  Proof  of  Lemma  1.  This  result  follows  by  the  Hadamard  differentiability  of  the  con- 
ditional distribution  function  with  respect  to  the  conditional  quantile  function  in  Lemma 
8,  Condition  Q,  and  the  functional  delta  method  in  Lemma  3.  D 

B.4.  Proof  of  Theorem  1.  The  joint  uniform  convergence  result  follows  from  Condition 
D  by  the  extended  continuous  mapping  theorem  in  Lemma  2,  since  the  integral  is  a  contin- 
uous operator.  Gaussianity  of  the  limit  process  follows  from  linearity  of  the  integral.      D 

B.5.  Proof  of  Theorem  2.  The  joint  uniform  convergence  result  and  Gaussianity  of  the 
limit  process  follow  from  Theorem  1  by  the  functional  delta  method  in  Lemma  3  ,  since 
the  quantile  operator  is  Hadamard  differentiable  (see,  e.g.,  Doss  and  GiU,  1992).  D 

B.6.  Proof  of  Corollary  1.  This  result  follows  from  Theorem  2  by  the  extended  contin- 
uous mapping  theorem  in  Lemma  2.  D 

B.7.  Proof  of  Corollary  2.  This  result  follows  from  Theorem  1  by  the  extended  contin- 
uous mapping  theorem  in  Lemma  2.  D 

B.8.  Proof  of  Corollary  3.  This  result  follows  from  Theorem  1  by  the  functional  delta 
method  in  Lemma  3  and  the  chain  rule  for  Hadamard  differentiable  functionals  in  Lemma 

4.    ■,.  :.:  -.   ,  ^  □ 

B.9.  Proof  of  Theorem  3.  This  result  follows  from  the  functional  delta  method  for  the 
bootstrap  and  other  simulation  methods  in  Lemma  6.  D 

Appendix  C.  Limit  distribution  for  the  estimators  of  the  effects 

For  policy  interventions  that  can  be  implemented  either  as  a  known  transformation 
of  the  covariate,  Xj  =  g{Xo),  or  as  a  change  in  the  conditional  distribution  of  Y  given 
X,  we  can  also  identify  and  estimate  the  distribution  of  the  effect  of  the  policy,  Aj  = 
Yj'  —  Yq,  j,k  G  {0,1},  under  Condition  RP  stated  in  the  main  text.  The  following 
results  provide  estimators  for  the  distribution  and  quantile  functions  of  the  effects  and 
limit  distribution  theory  for  them.  Let  V={6eR:  S  =  y  —  y,yEy,yE  y}. 

Lemma  9  (Limit  distribution  for  estimators  of  conditional  distribution  and  quantile  func- 
tions). Let  Qao{u\x)  =  QYo{u\g{x))  -  Qv-o(u|.t)  and  Qai{u\x)  =  Qyi{u\x)  -  QYg{u\x)  be 
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estimators  of  the  conditional  quantile  function  of  the  effect  Q/\^{u\x),j  £  {0, 1}.''^    Under 
the  conditions  C,  Q,  and  RP,  we  have:  '  ■  .    «^  ,:     ,■        ■,,■,";■ 


Vn  (Qa,{u\x)  -  Qa^{u\x)j  =^  \\{u,x),  j  G  {0,1},       ■■  '   '' 

in  i°°{{0,l)  X  X),  where  \4.o('"'>  •3^)  :=  \/^[K){",  5'(3:))  -  yo{u,x)]  and  Vaj^{u,x)  := 
\/X^Vi{u,  x)  —  ^/X^Vo{u,x).  The  Gaussian  processes  (u,x)  i— >  Va  {u,x),  j  G  {0,  1},  have 
zero  mean  and  covariance  function  Q\r^{u,x,u,x)  :=  E[V/^^{u,  x)Vi\r{u,x)],  for  j,r  G 
{0,1}. 

Let  F^j{S\x)  =  Jq  l{QAji'>i'\-T)  <  S}d,u  be  an  estimMor  of  the  conditional  distribution  of 
the  effects  F^,^{5\x),  for  j  G  {0,  1}.    Under  the  conditions  C,  Q,  and  RP.  we  have: 

%/^(FA,(5ix)  -Fa/(5|x))  ^  -fA,{6\x)VA^{FA,{S\x)^x)  =:  Za,{S,x),  j  G  {0,1}.      ' 

in  i°^{T>  X  A!),  and  {S,x)  t-^  Z^ji^.x),]  G  {0,  1},  have  zero  mean  and  covariance  function 
Jlz  ^((5.  .T,  5,  x)  :=  £'[Z/\,^((5,  x)ZAr(i5,  x)],  for  j,r  G  {0,1}.  The  conditional  density  of  the 
effect,  /a  (5|x),  is  assumed  to  be  hounded  above  and  away  from  zero}' 

Proof  of  Lemma  9.  The  uniform  convergence  result  for  the  conditional  quantile  processes 
\/^(OAj(fi|x)  —  (3aj("|x)),  j  G  {0, 1},  follows  from  Conditions  Q  and  RP  by  the  extended 
continuous  mapping  theorem  in  Lemma  2.  Uniform  convergence  of  the  conditional  distri- 
bution processes  \/n(-^Aj('5|x)  —  Faj{5\x)),j  G  {0,  1},  follows  from  the  covergence  of  the 
quantile  process  by  the  functional  delta  method  in  Lemma  3.  The  Hadamard  differentia- 
bility of  Fa  {5\x)  with  respect  to  Qa  {u\x)  can  be  established  using  the  same  argument 
as  in  the  proof  of  Lemma  8.  .  D 

Theorem  4  (Limit  distribution  for  estimators  of  the  marginal  distribution  and  quantile 
functions).  Under  the  conditions  M,  C,  Q,  and  RP,  the  estimators  F^  (S)  = 
/v -^Aj((^|x)cfF\-^,  (x)  of  the  marginal  distributions  of  the  effects  F^  (6)  jointly  converge 
in  law  to  the  following  Gaussian  processes: 

VTi{Fi^[8)  -  Fi^{6))  ^  ^^ZA,(<5,,T;)r/Fv,(x)  ^:  Zi^[8),  j,k  G  {0,1}, 

in  i°°{T>),  where  5  h->  Z^  (5),  j,k  G  {0,1},  have  zero  mean  and  covariance  function 
n'/J6,6)  :=  E[Zi^{S)Z^^/s)l  forj,k.,r^s  G  {0,  1}. 


In  the  distribution  approach,  Qy,  (u|a:)  can  be  obtained  by  inversion  of  the  estimator  of  the  conditional 
distribution. 

This  assumption  rules  out  degenerated  distributions  for  the  distribution  of  effects,  such  as  constant 
policy  effects.  These  "distributions"  can  be  estimated  using  standard  regression  methods. 
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Under  the  conditions  M,  C,  Q,  and  RP,  the  estimators  Q^  (u)  =  inf{(5  :  F^  [5)  >  u]  of 
the  marginal  quantile  functions  of  the  effects  Qa  (^^)  jointly  converge  in  law  to  the  following 
Gaussian  processes: 

V^{q1^{u)-Q%{u))  =>  -Zi^(Qi^(u))/4(Qi^(u))  =:  \%[u),  j,k  e  {0,1}, 

in  /?°°((0, 1)),  where  fi^{S)  =  J^  fA^{S\x)dFxi^ix)  and  u  ^  V^  {u),  j  e  {0, 1},  have  zero 
mean  and  variance  function  fly^{u,u)  :=  E[V^  {u)V^^{u)],  for  j,  k,r,s  E  {0,1}. 

Proof  of  Theorem  4.  The  uniform  convergence  result  for  the  marginal  distribution 
functions  follows  from  the  convergence  of  the  conditional  processes  in  Lemma  9  by  the 
extended  continuous  mapping  theorem  in  Lemma  2,  since  the  integral  is  a  continuous  op- 
erator. Gaussianity  of  the  limit  process  follows  from  linearity  of  the  integral.  The  uniform 
convergence  result  for  the  quantile  function  follows  from  the  convergence  of  the  distribu- 
tion function  by  the  functional  delta  method  in  Lemma  3,  since  the  quantile  operator  is 
Hadamard  differentiable  (see,  e.g..  Doss  and  Gill,  1992).        ;    r— ■  /  D 

Appendix  D.  Inference  Theory  for  Counterfactuals  Estimators:  The 
.  ;:  -/■  ,.        Case  with  Estimated  Covariate  Distributions 

This  section  presents  additional  results  for  the  case  where  the  covariate  distributions 
are  estimated.  These  results  complement  the  analysis  in  the  main  text. 

D.l.  Limit  theory,  bootstrap,  and  other  simulation  methods.  We  start  by  restat- 
ing Condition  D  to  incorporate  the  assumptions  about  the  estimators  of  the  covariate 
distributions. 

Condition    DC.    (a)   Let   Z,{y,x)     :=     ^{FY^{y\x)  -  FY^{y\x))    and  G.v,(/)     ;  = 
.  i/n  J  fd{Fxi^  (x)  —  Fx;,  {x)),  where  Fx^  are  estimated  prohahility  measures,  for  j,  k  €  {0,  1 } . 
These  measures  must  support  the  P-Donsker  property,  namely 


Zq,  Zj,  G  X,  G  xj  =>  (v '^0-2^0,  v  ^\Zi,  V '■^oCa'o,  vAiGxi 

m  the  space  i^{y  x  X)  x  i°°{y  x  ;f )  x  i°^{J=-)  x  i°°{J^),  for  each  Fx-Donsker  class  T, 
where  the  right  hand  side  is  a  zero  mean  Gaussian  process  and  Xj  is  the  limit  of  the  ratio 
of  the  sample  size  in  group  j  to  the  total  sample  size  n,  for  j  G  {0,  1}. 

(b)  The  function  class  {Fy^  {y\^)>  y  ^  y}  is  Fx^-Donsker,  for  j,  fc  €  {0,  1}. 

The  condition  on  the  estimated  measure  is  weak  and  is  satisfied  wh6n  Fxj  is  an  empirical 
measure  based  on  a  random  sample.    Moreover,  the  condition  holds  for  various  smooth 
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empirical  measures;  in  fact,  in  this  case  the  class  of  functions  T  for  which  DC(a)  holds  can 
be  much  larger  than  Glivenko-Cantelli  or  Donsker  (see  Radulovic  and  Wegkamp,  2003, 
and  Gine  and  Nickl,  2008).  Condition  DC(b)  is  also  a  weak  condition  that  holds  for  rich 
classes  of  functions,  see,  e.g.,  van  der  Vaart  (1998). 

Theorem  5  (Limit  distribution  and  inference  theory  for  counterfactual  marginal  distribu- 
tions). (1)  Under  conditions  M  and  DC  the  estimators  Fy  (y)  =  J^  Fy^  {y\x)dFx^{x)  of  the 
marginal  distribution  Junctions  Fy  (y)  jointly  converge  m  law  to  the  following  Gaussian 
processes: 


V^[F^-iy)  -  F^-{y)]  ^  VXjZ^iy)  +  ^X,Gx,{Fy^{y\-))  =:  Z^{y),    j,k  e  {0, 1}.   (D.l) 

in  £°^{y),  where  y  i-^  Zj{y),  j,k  £  {0,1},  have  zero  mean  and  covariance  function,  for 
j,k,r,se  {0.1}, 


st(y,y)  :=  x/AASt(y-.y)  +  Vh>^sE  [GxAFyM-))GxAFvM-))]  ,         (D-2) 

where  El'^  is  defined  as  in  (3.4)- 

(2)  Any  bootstrap  or  other  simulation  method  that  consistently  estimates  the  law  of  the 
empirical  process  (Zq,  Zj,  G.Vo,  GyJ  in  the  space  £^(J^  x  X)xt'^{y  x  X)  x  (.'^  [T)  x  t°°  [T) , 
also  consistently  estimates  the  law  of  the  empirical  process  {Z^.Z\,Z\,Z\)  in  the  space 

e°°{y)  X  e'^iy)  x  e°^{y)  x  f°°(3^). 

Proof  of  Theorem  5:  The  first  part  of  the  theorem  follows  by  the  functional  delta  method 
in  Lemma.  3  and  the  Hadamard  differentiability  of  the  marginal  functions  demonstrated  in 
Lemma  10  below  with  t  =  l/\/n.  The  second  part  of  the  theorem  follows  by  the  functional 
delta  method  for  the  bootstrap  and  other  simulation  methods  in  Lemma  6.  D 

The  expressions  for  the  covariance  functions  can  be  further  characterized  in  some  leading 
cases; 

(1)  The  distributions  of  the  covariates  in  groups  0  and  1  correspond  to  different  popula- 
tions and  are  estimated  by  the  empirical  distributions  using  mutually  independent  random 
samples.  In  this  case  Gxq  and  Gxi  are  independent  integrals  over  Brownian  bridges,  and 
the  second  component  of  the  covariance  function  in  (D.2)  is  J;^\F)'  {y\x)  —  Fy  {y)]{F)-^{y\x)  — 
Fy^{y)]dFx^{x)  for  /c  =  s  and  zero  for  k  ^  s. 

(2)  The  covariates  in  group  j  are  known  transformations  of  the  covariates  in  group 
0,  A'l  =  g{Xo),  and  the  covariate  distribution  in  group  0  is  estimated  by  the  empirical 
distribution  from  a  random  sample.     In  this  case  Gxo  and  Gxj   are  highly  dependent 
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processes.  The  second  components  of  the  covariance  function  in  (D.2)  is  J;:^'[FYjiy\x)  — 
F^M  [FvAm  -  Fl{y)]dFxo{'^)  for  k  =  s  =  0,  JjFY^{y\g{x))  -  F,\{y)]FyMg{^))  " 
FimdFxoix)  for  k^s  =  l,  and  J^[Fy^{y\x)  -  F°^,(y)][Fy,(y|5(x))  -  Fl{y)]dFx,{x)  for 

ky^S. 

Corollary  4.  Limit  distribution  theory  and  validity  of  bootstrap  and  other  simulation 
methods  for  the  estimators  of  the  marginal  quantile  function,  quantile  policy  effects,  distri- 
bution policy  effects,  and  differentiable  functionals  can  be  obtained  using  similar  arguments 
to  Theorems  2  and  3,  and  Corollaries  1-3  with  obvious  changes  of  notation. 

D.2.  Hadamard  derivatives  of  marginal  functionals.  In  order  to  state  the  next  re- 
sult, we  define  the  pseudometric  Pi^^p)  on  y  x  X ^  and  on  !F  by 


Pi2fp^((y,a;),(y,x))  = 


'LHP) 


ElZjiy.x)  -  Zj{y,: 


1/2 


,  forj  G  {0,1}, 


p'iHP){fJ)  = 


E{GxM)-GxM)y 


-I  1/2 


,  for  k  6  {0,1}. 


It  follows  from  Lemma  18.15  in  van  der  Vaart  (1998)  that  y  x  A!  is  totally  bounded 
under  P^i2ip)  and  Zj  has  continuous  paths  with  respect  to  p'^iip)  for  each  j.  Moreover, 
the  completion  of  3^  x  A',  denoted  y  x  X ,  with  respect  to  either  of  the  pseudometrics  is 
compact.  Likewise,  JF  is  totally  bounded  under  p\iip\  for  each  k.       '    , 

Lemma  10.   Consider  the  mapping  0  :  D^  C  D  =  ^°^'{yX)  x  i"°{T)  h^  E  =  t^iy), 
..;-■       -^^         4>{Fy^,Fx,):^  j  Fy^{-\x)dFx,{x),  j,ke{(),l].  : 

where  the  domain  D^  is  the  product  of  the  space  of  the  conditional  distribution  functions 
Fy{-\-)  ^  T  on  yx  and  the  space  of  bounded  maps  f  t-H>  j  fdFx^,  where  Fx,.  is  a  dis- 
tribution function  on  X,  forj.k  G  {0,1}.^^  Consider  the  sequence  {Fy  ,  F*^  )  G  D^  such 
that  for  a]  :=  (F^^  -  Fy^)l{t^),  dPi  :=  d{F\,^  -  Fx,)/{ty%),  and  pi{f)  :=  jfddl  as 

t\Q        ;   ■,.,   -.^^     .       ■  .    ■     •  .     :.  ;       •:   .,:  ^    ,:.     .;     .■     ■    ^  = 

,,,,_.  a]-^a,eC{yX,p{,^^p;)      ini^[yX). 

Pi^PkeC{:F,pl^p^)      me,^{r), 

for  the  Fx^'Donsker  class  T  and  j,k  G  {0,  1}.  Finally,  we  assume  that  {F)-^{y\x),y  G  y} 
is  Fx^-Donsker,  for  j,  /c  G  {0, 1}.   Then,  as  t  \0 
,      0(F^,,Fj,J-0(Fy^,F;,,; 


t 


<P\ 


Py,  ,Fx, 


(aj,A), 


i& 


That  is,  we  identify  Fx^^  with  the  map  /  i— >  J  fdFxi,  in  i^{J^). 
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where 


and  the  derivative  m.ap  {a,p)  h->  (p'p    p    (a,/5),  mapping  D^  to  E,  is  continuous. 


Proof  of  Lemma  10.  Write 


'^(n,.F"x,)-<l'(Fyyi'x,) 


~'^'FY,FxMj^0k)  as 


%  I  (QJ  -  a',>/Fv,  +  ^/^tJ  Fy^  {dpi  -  dA)  +  \/AA  /  (^Ml^i  +  \/V^  /  ("^  -  "jO^ci/J^ 

The  first  term  of  (D.3)  is  bounded  by  |JqJ— Q'j|J3;;t'  /  dFxf.  -^  0.  The  second  term  vanishes, 
since  for  any  Fx^-Donsker  set  T,  J  /dpi  —>  J  fdpk  in  i°°{T),  and  {Fvv(y|x),y  ^  y}  C  f 
by  assumption.  The  third  term  vanishes  by  the  argument  provided  below.  The  fourth 
term  vanishes,  since  |  J{a'j  —  aj)tdP\.\  <  ||aj  —  Qj||y;f  /  \td3i\  <  2||Qf^  —  Q;j||y;t'  — >  0. 


Since  a^  is  continuous  on  the  compact  semi-metric  space  (J^.-f ,  p-^s/p)),  there  exists  a 
finite  measurable  partition  U,"ii3^^,:m  of  yX  such  that  a^  varies  less  than  e  on  each  subset. 
Let  7rm(y,:r)  =  (yi„i,x,>n)  if  (y,x)  G  y^im,  where  [yim^^im)  is  an  arbitrarily  chosen  point 
within  yX^m  for  each  ?';  also  let  limilj^^)  =  ^{{iJ,^')  £  3^'^im}-  Then 


a^tdPl 


<  2\\Qj  -  aj  O  TTr^Wy;^  +  ^  |a-j(y,m,  X,m)\tPUltr 

<  2e  +  J]  \a,{y,rn,X,r,MPk{hra  +  o(l)) 


i=l 


<  2e  +  ^771 

<  2e  +  O(0, 


|Qj||y;tmaX/5fc(l.m)  +o(l) 

7<77l 


since  {Itm,?-  <  w}  is  a  FA-^-Donsker  class.    The  constant  e  is  arbitrary,  so  the  left  hand 
side  of  the  preceding  display  converges  to  zero. 

Finally,  the  norm  on  D  is  given  by  ||  •  ||:va' V  ||  •  ||.f .  The  second  component  of  the  derivative 
map  is  trivially  continuous  with  respect  to  ||  •  ||jr.  The  first  component  is  continuous  with 
respect  to  ||  •  \\yx  by  the  first  term  in  (D.3)  vanishing,  as  shown  above.  Hence  the  derivative 
map  is  continuous.  D 
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Appendix  E.  Functional  Delta  Method  and  Bootstrap  and  Other 

Simulation  Methods  for  Z-processes 

This  section  derives  a  preliminary  result  that  is  key  to  deriving  the  limit  distribution  and 
inference  theory  for  various  estimators  of  the  conditional  distribution  and  quantile  func- 
tions. This  result  shows  that  suitably  defined  Z-estimators  satisfy  a  functional  central  limit 
theorem  and  that  we  can  estimate  their  laws  using  bootstrap  and  related  methods.  The 
result  follows  from  a  lemma  that  establishes  Hadamard  differentiability  of  Z-functionals  in 
spaces  that  are  particularly  well-suited  for  our  applications. 

El.  Limit  distribution  and  inference  theory  for  approximate  Z-processes.  Let 

us  consider  an  index  set  T  and  a  set  0  C  W.  We  consider  Z-estimation  processes  {9{u),  u  G 
T},  where  for  each  u  6  T,  e{u)  satisfies  ||$(^(u),  u)\\  <  iniee®  ll$(^,  u)\\+tn,  with  e^  \  0  at 
some  rate.  That  is,  9{;u)  is  an  approximate  solution  to  the  problem  of  minimizing  ||^(6',  ';/,)|| 
over  0  6  0.  The  random  function  {9,  u)  h->  '^{9,  u)  is  an  estimator  of  some  fixed  population 
function  {9,u)  h^  '^{9,u),  and  satisfies  a  functional  central  limit  theorem.  The  following 
lemma  specifies  conditions  under  which  the  Z-processes  satisfy  a  functional  central  limit 
theorem,  and  under  which  bootstrap  and  other  simulation  methods  consistently  estimate 
the  law  of  this  process. 

Lemma  11  (Limit  distribution  and  inference  theory  for  approximate  Z-processes).  Let  T 
he  a  relatively  coinpact  set  of  some  metric  space,  and  Q  be  a  compact  subset  ofW.  Assume 
that  .:.__.._  '\  ,,,_ 

(i)  for  each  u  G  T,  '^{■.u)  :  0  »^  R^  possesses  a  unique  zero  at  9o{u)  £  interior  Q, 
and  has  inverse  \l/~^(-,  u)  that  is  continuous  at  0  uniformly  m  u  E  T, 

(ii)  ^{■,u)  is  continuously  differentiable  at  9q{u)  uniformly  m  u  £  T,  with  derivative 
'^eo{u).u  ihat  is  uniformly  non-singular,  namely  inf„gTinfi]/,[[=,i  ||\i>gg(„)_^/!.|j  >  0. 

(iii)  \/n(^  —  ^)  =>  Z  in  i'^{Q  x  T),  where  Z  is  a.s.  continuous  on  Q  x  T  with  respect 
to  the  Euclidean  metric, 

(iv)  Bootstrap  or  some  other  method  consistently  estimates  the  law  of  \/n(^^  —  Vl'). 

For  each  u  G  T,  let  9{u)  be  such  that  \\'^{9{u),u)\\  <  mU£s\\'^i9:U)\\  +  e„,  with  e„  = 
o(n~''''^).   Then,  under  conditions  (i)-(iii) 

y^(^(.)_^„(.))^_4,-^i^_  [Z(0o(.),.)]        inr[T). 

Moreover,  any  bootstrap  or  other  method  that  satisfies  condition  (iv)  consistently  estimates 
the  law  of  the  empirical  process  y/n{9  —  9o)  in  i°°{T). 
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Proof  of  Lemma  11.  The  results  follow  by  the  functional  delta  method  in  Lemma  3 
and  by  the  functional  delta  method  for  bootstrap  and  other  methods  in  Lemma  6,  and  the 
Hadamard  differentiability  of  Z-functionals  established  in  Lemma  12  with  (  =  l/>/n-      □ 

The  proof  of  the  preceding  result  relies  on  the  following  lemma.  Let  T  be  a  relatively 
compact  set  of  some  metric  space,  and  0  be  a  compact  subset  of  W.  An  element  6  E  Q 
is  an  r-approximate  zero  of  the  map  6  i—^  z{9,  u)  if  for  some  r  >  0 

\\zie,u)\\  <  inf  \\z(e',u)\\  +  r. 

Let  4>{-,r)  ;  i°°{Q)  i— >  6  be  a  map  that  assigns  one  of  its  r-approximate  zeroes  (p{z{-,  u),  r) 
to  each  element  z{-,v.)  G  (.'^{Q).  "  ■ 

Lemma  12.  Assume  that  conditions  (i)  and  (li)  on  the  function  *P  stated  in  the  preceding 
lemma  hold.  Take  any  Zi  —^  z  uniformly  on  Q  x  T  as  t  \  0,  for  a  continuous  map 
z  :  Q  X  T  t—f  W,  and  suppose  that  qt  \  0  uniformly  on  T  as  t  \  0.  Then,  for  the 
fqt{n)- approximate  zero  of  <!'(•,  u)  +  tzt{-,  u)  denoted  as  Ot{u)  —  0(^(-,  w)  +  /::((•,  u),  tqt{u)) 
we  have  that,  uniformly  m  u  E  T, 

Here  it  is  useful  to  think  of  /  as  l/y/n,  where  n  is  the  sample  size. 

Remark.  Our  lemma  is  an  alternative  to  van  der  Vaart  and  Wellner's  (1996)  Lemma 
3.9.34  on  Hadamard  diiferentiability  of  Z-functionals  in  general  normed  spaces.  The  con- 
ditions of  their  lemma  are  difficult  to  meet  in  our  context  because  they  include  the  uniform 
convergence  of  the  functions  Zt  over  the  parameter  space  T  =  /?°°(T),  the  collection  of  all 
bounded  functions  on  T,  which  is  an  extremely  large  parameter  space.  In  particular,  to 
apply  their  lemma  we  need  that  the  empirical  processes  ^/n{'^  —  ^)  indexed  by  J^  =  £°°{T) 
converge  weakly  in  the  space  (!°^{J^  x  T),  which  appears  to  be  difficult  to  attain  in  appli- 
cations such  as  quantile  regression  processes.  Indeed,  note  that  weak  convergence  in  this 
space  requires  J-  to  be  totally  bounded,  which  is  hard  to  attain  when  J-'  is  too  rich  a  space. 
See  Van  der  Vaart  and  Wellner  (1996)  p.  396  for  a  comment  on  the  limitation  of  their 
Lemma  3.9.34.  Moreover,  our  lemma  allows  for  approximate  Z-estimators.  This  allows  us 
to  cover  quantile  regression  processes,  where  exact  Z-estimators  do  not  exist. 

Proof  of  Lemnia  12.  We  have  that  ^(6'o(n),  u)  =  0  for  all  u  G  T.  Let  Zf  —>  z  uniformly 
on  0  X  T  for  a  map  z  :  Q  x  T  i-^  £°^{Q  x  T)  that  is  continuous  at  each  point,  and  qt  \  0 
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uniformly  in  u  E  T  as  t\0.  By  definition  9t{u)  =  0('I'(-,u)  +  tzt{-,u),tqt{u))  satisfies 

\\^{et{u),u)-<l'{Bo{u),u)+tZt{e;{u),u)\\  <  M  \\^{B,u)+tZi{e,u)\\+tqt{u)  =:  t\,,{u)+tq,{u), 

uniformly  in  u  G  T.  Tlie  the  rest  of  tlie  proof  iias  tliree  steps.  In  Step  1,  we  establish 
a  rate  of  convergence  for  9t{-)  to  6{-).  In  Step  2,  we  verify  the  main  claim  of  the  lemma 
concerning  the  Unear  representation  for  t~^{6t{-)  —  0{-)),  assuming  that  A;(-)  =  o(l).  In 
Step  3,  we  verify  that  At(-)  =  o(l). 

Step  1.  Here  we  show  that  uniformly  in  u  G  T,  \\9i{u)  -  9o{u)\\  <  c"^||^(6',,(ii),  u)  - 
'I'(^o("),^^)||  =  0{t).  Note  that  A,(u)  <  \\t-^^{9o{u).,u)  +  Zt{9o{u),u)\\  =  \\z{9o{u),u)  + 
o(l)||  =  0(1)  uniformly  in  u  G  T.  We  conclude  that  uniformly  in  v.  G  T,  as  /  \  0 

ri(^(0,(u),  u)  -  ^{9o{u),  u))  =  -zt{9t{u),  u)  +  0(At(u)  +  qt{u))  =  0(1) 

and  that  uniformly  in  u  G  T,  \\-^{9t{u),u)  -  '^{9o{a),u)\\  =  0{t).  By  assumption  '!'(•,■(/,) 
has  a  unique  zero  at  9o{u)  and  has  an  inverse  that  is  continuous  at  zero  uniformly  in  u  G  T; 
hence  it  follows  that  uniformly  in  li  G  T", 

\\9t{u)-9a{u)\\<dH{^-\^{Bt{u),u).u),<^-\{),u))-^Q, 

where  dn  is  the  Hausdorff  distance.  By  continuous  differentiability  assumed  to  hold  uni- 
formly  in  ^  G  T,  ||^(^,(u),  u)  -  ^[9^{u),u)  -  ^,o(„).„[^t(^^)  -  d^{u)]\\  =  o[\\9t{u)  -  9^{il)\\) 
so  that  uniformly  in  u  eT      '■-  , !  '       ■  _  ■'  '.  '        .  ' 

' '    ''  ^.^.^^\\<l'i9t{u),u)  ~  <i'{9oiu),u)\\     >^.^j^^f^    J^g,^^)J9,{u)-9o{u)]\\ 


t\o  mu)  -  9o{u)\\  -  '^'^  ||^,(u)-^o(^)|| 

>inf||,,|l=i||^(,o(u),u(/iOII  =  c>0, 

where  h  ranges  over  W,  and  c  >  0  by  assumption.   Thus,  uniformly  in  u  G  T,  \\9t{u)  — 
9o{u)\\<c-'\\<I'{Bt{u)^u)-<iJ{9o{u),u)\\  =  0{t).         ,        ....  .,.,..,. 

Step  2.  Here  we  verify  the  main  claim  of  the  lemma.  Using  continuous  differentiability 
uniformly  in  u  again,  conclude  \\'i{9t{u),u)  —  'i{9o{u),u)  —  *i'eo(u),u[^((u)  -  6'o(u)]||  =  o{t). 
Below  we  will  show  that  Xf{'ii)  =  o(l)  and  we  also  have  qt{'ii)  =  o(l)  uniformly  in  (/.  G 
T  by  assumption.  Thus,  we  can  conclude  that  uniformly  in  u  E  T,  t~^{'i/{9t{u),u)  — 
^ieo{u),  u))  =  -zt{9tiu),  u)  +  o(l)  =  -zi9oiu),  u)  +  o(l)  and 

t-%{u)  -  9o{u)]    =    %^l^^,^[r\<^{9tiu).u)-<i/{9oiu),u))  +  o{l)]  , 
.      '      ■  .     =    -^,4),j2(^oW,^^)]  +  o(l). 
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Step  3.  In  this  step  we  show  that  Xt{u)  =  o(l)  uniformly  in  u  G  T.  Note  that  for 
Oti'u)  :=  Oo{u)  -  i^~^(^)^^  [2:(6'o(a),(i)]  =  Oo{u)  +  0{i),  we  have  that  Ot  G  9,  for  small 
enough  t,  uniformly  in  u  G  T;  moreover,  Xt{u)  <  ||i~^^(^t(w),  u)  +  Zt{9t{u),u)\\  =  ||  — 
^eo{u)AK\u)M^o{u),u)]}  +  ::{eoiu),u)  +  o{l)\\  =  o{l),ast\0.  D 

Appendix  F.  Z-Estimators  of  Conditional  Quantile  and  Distribution 

'—^  .   '-   Functions 

This  section  derives  limit  theory  for  the  principal  estimators  of  conditional  distribution 
and  quantile  functions.  These  results  establish  the  validity  of  bootstrap  and  other  re- 
sampling plans  for  the  entire  quantile  regression  process,  the  entire  distribution  regression 
process,  and  related  processes  arising  in  estimation  of  various  conditional  quantile  and 
distribution  functions.  These  results  may  be  of  a  substantial  independent  interest. 

In  order  to  prove  the  results,  we  use  Lemmas  11  and  12.  We  also  specify  some  primitive 
conditions  that  cover  all  of  our  leading  examples.  In  all  these  examples,  we  have  functional 
parameter  values  u  i— >  9{u)  where  u.  G  T  C  R  and  9{u)  C  0  C  R^,  where  for  each  u  G  T, 
9o{u)  solves  the  equation 

'i'{9,u):=E{giW,9,u)]  =  0, 

where  g  ;  W  x  0  x  T  ^>  R^,  W  :=  {X,  Y)  is  a  random  vector  with  support  W.    For 
estimation  purposes  we  have  an  empirical  analog  of  the  above  moment  functions 

^i9,u)  =  E^[g{W„9,u)] 

where  En  is  the  empirical  expectation  and  (ll^i, ...,  l'V'„)  is  a  random  sample  from  W . 
For  each  u  e  T,  the  estimator  9{u)  satisfies  ||^(^(u),  u)||  <  inf^ge  ||^(^,  ■")||  +  ^n,  with 

Condition  Z.l.    The  set  Q  is  a  compact  subset  ofW  and  T  is  either  a  finite  subset 
or  a  bounded  open  subset  o/M'^. 

(i)  For  each  u  G  T,  ^{9,u)  :=  Eg{W,9,u)  =  0  has  a  unique  zero  at  9q{u)  :  = 
(q'o(u)', /3q)' G    interior  Q. 

(ii)  The  map  {9,u)  t—>  ^{9,u)  is  continuously  differentiable  at  {9q{u),u)  with  a  uni- 
formly bounded  derivative  on  T,  where  differentiability  in  u  needs  to  hold  for  the 
case  of  T  being  a  bounded  open  subset  o/ R''  ,•  ^g^u  =  G{9,u)  =  ^Eg{W,9,u)  is 
uniformly  nonsmgular  at  9q{u),  namely  infugj- inf |]/i||=i  ||>I'gg(u)^u/i||  >  0. 
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(iii)  The  function  set  Q  =  {g{W,6,u),{9,u)  €  0  x  T}  is  P-Donsker  with  a  square 
integrable  envelope  G.  The  map  {0,u)  i-^  g{W,0,u)  is  continuous  at  each  {9,  u)  G 
0  X  T  with  probability  one. 

Condition  Z.2.  Either  of  the  following  holds: 

(a)  the  conditional  distribution  has  the  form  Fy{u\x)  =  A{x,do{u));  or 

(b)  the  quantile  functions  have  the  form  Qy{u\x)  =  Q{x,9q{u)),  where  the  functions 
9  I— >  A(x,  9)  and  9  i-^  Q{x,  9)  are  continuously  differentiable  in  9  with  derivatives 
that  are  uniformly  bounded  over  the  set  X . 

Lemma  13.  Condition  Z.l  implies  conditions  (i)-(iv)  of  Lemm.a  11.  In  particular,  condi- 
tion (iii)  holds  with  ^/n{'i  —  ^)  ^  Z,  in  £°°(T),  where  Z  is  a  zero  mean  Gaussian  process 
with  continuous  paths  in  u  E  T  and  covariance  function 

n{u,u)  =  E[g{W,9o{u),u)g{W,9o{u),uy]. 

Condition  (iv)  holds  with  the  set  of  consistent  methods  for  estimating  the  law  of  y/n{^  —  ^i) 
consisting  of  bootstrap  and  exchangeable  bootstraps,  more  generally.  Consequently,  the  con- 
clusions of  Lemma  11  hold,  namely  Vn{9{-)-9o{-))  =>  -G{9o{-),  ■)-^  [Z{9o{-),  ■)]  in  i°°{T). 
Moreover,  bootstrap  and  exchangeable  bootstraps  consistently  estimate  the  law  of  the  em- 
pirical process  \/n{9  —  9q). 

This  lemma  presents  a  useful  result  in  its  own  right.  From  the  point  of  view  of  this  paper, 
the  following  result,  a  corollary  of  the  lemma,  is  of  immediate  interest  to  us  since  it  verifies 
Condition  D  and  Condition  Q  for  a  wide  class  of  estimators  of  conditional  distribution  and 
quantile  functions. 

Theorem  6  (Limit  distribution  and  inference  theory  for  Z-estimators  of  conditional  dis- 
tribution and  quantile  functions).  1.  Under  conditions  Z.l-Z.2(a),  the  estimator  {u,x)  *—> 
Fy{u\x)  of  the  conditional  distribution  function  {u,x)  i— >  Fy{u\x)  converges  in  law  to  a 
continuous  Gaussian  process: 

v^(Fy(4x)  -  Fy(n|x))  =»  Z{u,x)  :=  -^M^^^G{9oiu),u)-' Zi9o{u),u)      (F.l) 

in  i°° {y  X  A:!) ,  where  (?/.,  x)  i— >  Z{u,  x)  has  zero  mean  and  covariance  function  Tiz{u,  x,  u,  x)  :  = 
E[Z{u,x)Z{u,x)].    Moreover,  bootstrap  and  exchangeable  bootstraps  consistently  estimate 
the  law  of  Z .  •      , 
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I 

2.  Under  conditions  Z.l-Z.2(b),  the  estimator  {u,x)  *-^  Qy{u\x)  of  the  conditional 
quantile  function  {u,x)  •— >  Qy{u\x)  converges  in  law  to  a  continuous  Gaussian  process: 

V^i  [Qv{u\x)  -  Qy{u\x))  ^  \/(a,  x)  :=  -^^i|^l^G(^o('"),  u)-'Z{e,{u).  u),      (F.2) 

in  £°°{{0,  1)  X  A!),  where  the  process  {v.,  x)  i— ^  V{u,  x)  has  zero  mean  and  covariance  func- 
tion Ev'(tt,  x,  u,  x)  :=  E[V{u,x)V{u,x)].  Moreover,  bootstrap  and  exchangeable  bootstraps 
consistently  estiinate  the  law  of  V .  ^ 

Proof  of  Lemma  13.  We  shall  verify  conditions  (i)-(iv)  of  Lemma  11. 

We  consider  the  case  where  T  is  a  bounded  open  subset  of  R.  The  proof  for  the  case 
with  a  finite  T  is  simpler,  and  follows  similarly.  To  show  condition  (i),  we  note  that  by 
the  imphcit  function  theorem  and  uniqueness  of  6*0,  the  inverse  map  ^"■'(/u,u)  exists  on  a 
open  neighborhood  of  each  pair  (/j  =  0,u),  and  it  is  continuously  differentiable  in  (/u,  u) 
at  each  pair  (/;.  =  0,  n)  with  a  uniformly  bounded  derivative.  This  implies  that  for  any 
sequence  of  points  (/j(,  Ut)  -^  (0,  u)  with  u  £  T,  where  T  is  the  closure  of  T,  we  have  that 
||^~^(/it,  Ut)  —  '^~^iO,Ut)\\  =  0(||/if||)  =  o(l),  verifying  the  continuity  of  the  inverse  map 
at  0  uniformly  in  u.  We  can  also  conclude  that  9o{u)  =  "i'^lO^u)  is  uniformly  continuous 
on  T  and  we  can  extend  it  to  T  by  taking  limits. 

To  show  condition  (ii)  we  take  any  sequence  {ut,ht)  —>  {u,h)  with  u  G  T, /i  G  R^  and 
then  note  that,  for  t*  G  [0,  t] 

Atiuuhf)     =     rn^(^oK)  +  ^ht,u,)  -  ^i0o{ut),ut)}  -  -g^i&oiut)  +  t*h,,Ut)ht 

-.     ^{Ooiu),u)h  =  Gieo{u),a)h, 

using  the  continuity  hypotheses  on  the  derivative  d'^ /dO  and  the  continuity  of  v-  i— >  Oq{u). 
Hence  by  Lemma  5,  we  conclude  that  sup^cT^n^jij^i  \At{u.  h)  —  G{9o{u),  u)h\  ^  0  as  /  \  0. 

To  show  condition  (iii),  note  that  by  the  Donsker  central  limit  theorem  for  ^(^,  u)  = 
Enlg{Wi,6,u)]  we  have  that  \/n{'i  —  ^)  =>  Z,  where  Z  is  a  zero  mean  Gaussian  pro- 
cess with  covariance  function  Q.{u,ii)  =  E[g{W,9o{u),u)g{\V,9o{u),u)']  that  has  contin- 
uous paths  with  respect  to  the  L2{P)  semi-metric  on  Q.  The  map  {0,u)  i-^  g{\V,9,u) 
is  continuous  at  each  (6',  u)  with  probability  one.  The  only  result  that  is  not  immediate 
from  the  assumptions  stated  is  that  Z  also  has  continuous  paths  on  0  x  T  with  respect 
to  the  Euclidean  metric  ||  •  |{.  By  assumption  Z  has  continuous  paths  with  respect  to 
PLHP){{9,u)Ae,u))  =  {E[g{W,9,u)~g{W,e,u)fy/~,  As  |i(^,n)  -  {e,u)\\  ^  0,  we  have 
that  g{W,9,u)  —  g{W,9,u)  -^  0  almost  surely.   It  follows  by  the  dominated  convergence 
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theorem,  with  dominating  function  equal  to  (2(5)^,  where  G  is  the  square  integrable  en- 
velope for  the  function  class  Q,  that  {E\g[W,  0,  u)  -  g{W,  9,  u)]-}^/^  -^  0.  This  verifies  the 
continuity  condition.  The  square  integrable  envelope  G  exists  by  assumption. 

To  show  (iv),  we  simply  invoke  Theorem  3.6.13  in  Van  der  Vaart  and  Wellner  (1996) 
which  implies  that  the  bootstrap  and  exchangeable  bootstraps,  more  generally,  consistently 
estimate  the  hmit  law  of  \/n{'i  —  \1/),  say  G,  in  the  sense  of  equation  (A. 2).  D 

Proof  of  Theorem  6.  This  result  follows  directly  from  Lemma  12,  the  functional  delta 
method  in  Lemma  3,  the  chain  rule  for  Hadamard  differentiable  functionals  in  Lemma  4, 
and  the  preservation  of  validity  of  bootstrap  and  other  methods  for  Hadamard  differen- 
tiable functionals  in  Lemma  6.  D 

F.l.  Examples  of  conditional  quantile  estimation  methods.  We  consider  the  loca- 
tion and  quantile  regression  models  described  in  the  text. 

Example  2.  Quantile  regression.  The  conditional  quantile  function  of  the  outcome 
variable  Y  given  the  covariate  vector  X  is  given  by  X'Pq{-).  Here  we  can  take  the  moment 
functions  corresponding  to  the  canonical  quantile  regression  approach:    - 

g{W,/3,u)  =  {u-l{Y<X'P})X.  '  (F.3) 

We  assume  that  the  conditional  density  Jy{-\X)  is  uniformly  bounded  and  is  continuous 
at  X'Po{u)  uniformly  in  u  e  T,  almost  surely;  moreover,  iniuer  fri^'Poi'^)]^)  >  c  >  0 
almost  surely;  and  ElXX']  is  finite  and  of  full  rank.  The  true  parameter  Po{u)  solves 
Eg{W,  P,u)  =  0  and  we  assume  that  the  parameter  space  9  is  such  that  Po{u)  £  interior  0 
for  each  u  £  (0, 1). 

Lemma  14.  Conditions  Z.l-Z.2(b)  hold  for  this  example  with  mom,ent  function  given 
by  (F.3),  T  =  (0,1),  Qy{u\x)  =  x'Poiu),  G{po{u),u)  =  -E[fy{X'po{u)\X)XX'],  and 
n{u,u)  =  {min(n,ii)  —  uu}E[XX']. 

Proof  of  Lemma  14.  To  show  Z.l,  we  need  to  verify  conditions  on  the  derivatives  of 
the  map  P  i— >  Eg{W,P,u).  It  is  straightforward  to  show  that  we  have  that  at  (/?,  u)  = 
{Pq{u),u),  ■     ■         ,        -  i  ,.. 

^-^^Eg{W^p,u)  =  [G{P,u),EX]  =  [-E[fy{X'P\X)XXlEX], 

and  the  right  hand  side  is  continuous  at  {Po{il),u).  This  follows  using  the  dominated 
convergence  theorem,  the  a.s.  continuity  and  boundedness  of  the  mapping  y  i-h>  /y-(y|A') 
at  X'Po{u),  £LS  well  as  finiteness  of  i?||X|p.  Finally,  note  that  Po{u)  is  the  unique  solution  to 
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Eg{W,  P,u)  =  0  for  each  u  because  it  is  a  root  of  a  gradient  of  convex  function.  Moreover, 
uniformly  in  u  G  (0,  1),  G{Po{u),u)  >  J'EXX'  >  0,  where  /  is  the  uniform  lower  bound 
on  fY{X'goiu)\X).  ■      -■      '■■         i      ' 

To  show  Z.l(iii)  we  verify  that  the  function  class  Q  is  P-Donsker  with  a  square  integrable 
envelope  and  the  continuity  hypothesis.  The  function  classes  J^j  =  T,  To  =  1{K  < 
X'0,P  G  W}  are  VC  classes.  Therefore  the  function  classes  J^^j  =  ^k^j  are  also  VC 
classes  because  they  are  formed  as  products  of  a  VC  class  with  a  fixed  function  (Lemma 
2.6.18  in  van  der  Vaart  and  Wellner,  1996).  The  difference  Tij—J^2,  is  a  Lipschitz  transform 
of  VC  classes,  so  it  is  P-Donkser  by  Example  19.9  in  van  der  Vaart,  1998.  The  collection 
Q  =  {T\j  -  J^2]yj  =  li  •■•:P}  is  thus  also  Donsker.  The  envelope  is  given  by  2  max^  \X.j\ 
which  is  square-integrable.  Finally,  the  map  {d,u)  \—^  {u  —  1{Y  <  X'(5))X  is  continuous 
at  each  {P,u)  &  Q  x  T  with  probability  one  by  the  absolute  continuity  of  the  conditional 
distribution  of  V. 

To  show  Z.2(b),  we  note  that  the  map  (x,  0)  i— >  x'9  trivially  verifies  the  hypotheses  of 
Z.2(b)  provided  the  set  A"  is  compact.  D 

Example  1.  Classical  regression.  This  is  the  location  model  V  =  X'Po  +  V,  where 
X  is  independent  of  V.  so  the  conditional  quantile  function  of  outcome  variable  Y  given  the 
conditioning  variable  A'  is  given  by  yY'/?o  +  a'o(-)i  where  EfKIA']  =  X'Po  and  Q'o(-)  =  Qv{-)- 
Here  we  can  take  the  moment  functions  corresponding  to  using  least  squares  to  estimate 
Pq  and  sample  quantiles  of  residuals  to  estimate  Qq- 

giW,a,p.u)  =  [{u-l{Y-X'p<a}).{Y-X'P)X']':  (F.4) 

We  assume  that  the  density  of  V  =  Y  —  X'Po,  /v'(')  is  uniformly  bounded  and  is  con- 
tinuous at  Qo(^')  uniformly  in  u  G  T,  almost  surely;  moreover,  inf^gT- /(q:o(u))  >  c  >  0 
almost  surely;  EXX'  is  finite,  and  full  rank,  and  EY'^  <  oo.  The  true  parameter  value 
(ao(u),/3o)'  solves  Eg{W,Q,P,xi)  =  0  and  we  assume  that  the  parameter  space  0  is  such 
that  (oo('"),/3o)'  e    interior  0  for  each  u  G  (0,  1). 

Lemma  15.  Conditions  Z.l-Z.3(h)  hold  for  this  exam.ple  with  moment  junction  given  by 
(F.4),  T  =  (0, 1),  Qy{u\x)  =  x'po  +  ao{u), 


G{ao{u),po,u) 


fviMu))    fv{ao{u))E{X]' 
Opxi  EXX' 


(F.5) 
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and 


Q{u,  u)  = 


mm(ii,  u)  —  uti 
-E[V  1{V  <  ao{u)}]E[X] 


-E[V  1{V  <  ao{u)}]E[X]' 
E[V^]EXX' 


(F.6) 


Proof  of  Lemma  15.  The  proof  follows  analogously  to  the  proof  of  Lemma  14.  Unique- 
ness of  roots  can  also  be  argued  similarly,  with  do  uniquely  solving  the  least  squares  normal 
equation,  and  Qq  uniquely  solving  the  quantile  equation.  D 

F.2,  Examples  of  conditional  distribution  function  estimation  methods.  We  con- 
sider the  distribution  regression  model  described  in  the  text  and  an  alternative  estimator 
for  the  duration  model  based  on  distribution  regression. 

Example  4.  Distribution  regression.  The  conditional  distribution  function  of  the 
outcome  variable  Y  given  the  covariate.  vector  X  is  given  by  A{X'(3q{-)),  where  A  is  either 
the  probit  or  the  logit  link  function.  Here  we  can  take  the  moment  functions  corresponding 
to  the  pointwise  maximum  likelihood  estimation; 

A{X'P)-l{Y<y} 


9{W,P,y)  = 


X{X'/3)X, 


:f.7) 


A(A"/?)(1-A(X'/?))' 

where  A  is  the  derivative  of  A.  Let  3^  be  either  a  finite  set  or  a  bounded  open  subset  of 
R*^.  For  the  latter  case  we  assume  that  the  conditional  distribution  function  y  i->  FyiylX) 
admits  a  density  y  t-^  /y'(y|3:),  which  is  continuous  at  each  y  E  y,  a.s.  Moreover,  EX X'  is 
finite  and  full  rank;  the  true  parameter  value  /3o(y)  belongs  to  the  interior  of  the  parameter 
space  0  for  each  y  E  y;  and  A{X'P){1  -  A{X'l3))  >  c  >  0  uniformly  on  (3  G  0,  a.s. 

Lemma  16.  Conditions  Z.l-Z.2(a)  hold  for  this  example  with  moment  Junction  given  by 
(F.7),T  =  y,u  =  y,FY{y\x)  =  A{x'Po{y)), 

x{X'Po{y)r 


and,  for  y  >  y, 


G{Po{y),y)--=E 


n{y,y)  =  E 


A{X'Poiym-A{X'Po{y))] 
X{X'Poiy))HX'Pom 


XX' 


A(X'/?o(y))[l-A(.Y'/3o(y))] 


A' A" 


Proof  of  Lemma  16.   We  consider  the  case  where  y  is  a  bounded  open  subset  of  R*^. 
The  case  where  3^  is  a  finite  set  is  simpler  and  follows  similarly.  ■      ■ 

To  show  Z.l,  we  need  to  verify  conditions  on  the  derivatives  of  the  map  p  i-+  Eg{W,  P,  u). 
By  a  straightforward  calculation  we  have  that  at  (/3,y)  =  (/3o(y),?y), 


d 


d{P',y) 


Eg{W,p,y)  = 


E[—g{W,P,y)],[g^Eg{W,P,y) 


=  [G(/3,y),/?(/i,y)], 
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where,  for  H{z)  =  X{z)/{A{z){l  -  A{z)]}  and  h{z)  =  dH{z)/dz,         ,  '    , 

GW,y)    -    E[{h{X'P)[A{X'/3)-l{Y<y}]  +  H{X'P)X{X'P)}XX'],       ' 
i?(/3,y)     =     E[H{X'P)fy{y\X)X}].     , 

Both  terms  are  continuous  in  {f3,  y)  at  {Po{y),  y)  for  each  y  E  y.  This  follows  from  using  by 
the  dominated  convergence  theorem  and  the  following  ingredients:  (1)  a.s.  continuity  of  the 
map  {p,y)  ^  ^g{W,Po{y),y),  (2)  domination  of  \\-^g{W,P,y)\\  by  a  square-integrable 
function  constllXJI,  (3)  a.s.  continuity  of  the  conditional  density  function  y  h^  /y(y|X), 
and  (4)  A(A''/5)(1  -  A(A"/?))  >  c  >  0  uniformly  on  /?  £  9,  a.s.  Finally,  also  note  that 
the  solution  Po{y)  to  Eg{W,p,y)  =  0  is  unique  for  each  y  E  y  because  it  is  a  root  of  a 
gradient  of  a  convex  function. 

To  show  Z.l(iii),  we  verify  that  the  function  class  Q  is  P-Donsker  with  a  square  integrable 
envelope.  Function  classes  J^i  =  {X'P,p  G  6},  J^2  =  {1{^'  ^  y}:y  E  >'},  and  {Xj}, 
j  =  1,  ...,p  are  VC  classes  of  functions.  The  final  class  ,. 


f       A(^i)-.F, 


2 


\A(^i)(1-A(J-i)) 


X{J='i)Xj,    j  =  l,...,p 


is  a  Lipschitz  transformation  of  VC  classes  with  Lipschitz  coefRcient  bounded  by  craaxj  \Xj\ 
and  the  envelope  function  c'max^  \Xj\,  which  are  sciuare- integrable;  here  1  and  c'  are  some 
positive  constants.  Hence  G  is  Donsker  by  Example  19.9  in  van  der  Vaart  (1998).  Finally, 
the  map 

is  continuous  at  each  {p,  y)  G  0  x  3^  with  probability  one  by  the  absolute  continuity  of  the 
conditional  distribution  of  V  and  by  the  assumption  that  A(A''/3)(1  -  A{X'p))  >  c  >  0 
uniformly  on  p  E  Q,  a.s. 

To  show  Z.2(a),  we  note  that  the  map  {x,0)  t-^  A{x'0)  trivially  verifies  the  hypotheses 
of  Z.2(a)  provided  the  set  X  is  compact.  D 

Example  3b.  Duration  regression.  An  alternative  to  the  proportional  hazard 
model  in  duration  and  survival  analysis  is  to  specify  the  conditional  distribution  function 
of  the  duration  Y  given  the  covariate  vector  X  as  A(ao(')  +  X' Pq),  where  A  is  either  the 
probit  or  the  logit  link  function.  We  normalize  Q'o(yo)  =  0  at  some  yo  E  y.   Here  we  can 
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take  the  following  moment  functions: 

h{a  +  X'l3)-\{Y  <y) 


5(l4/,a,/?,y)  = 


\{a  +  X'p) 


A{a  +  X'p){l-A{a  +  X'0)) 

MX'p)  -  i{Y  <  y.\,^^,^^^ 


h{X'(3){l-K{X'(3)\ 

where  A  is  the  derivative  of  A.   The  first  set  of  equations  is  used  for  estimation  of  cvo(y) 
and  the  second  for  estimation  of  /^o- 

Let  y  be  either  a  finite  set  or  a  bounded  open  subset  of  R''.  For  the  latter  case  we  assume 
that  the  conditional  distribution  function  y  i-^  Fy(i/|A')  admits  a  density  y  h^  Iy{y\x)., 
which  is  continuous  at  each  y  G  3^,  a.s.  Moreover,  EXX'  is  finite  and  full  rank;  the 
true  parameter  value  (Q'o(y),  /Jq)'  belongs  to  the  interior  of  the  parameter  space  0  for  each 
y  ey-,  and  A(a  +  X'P){1  -  A(a  +  X'i3))  >  c>  0  uniformly  on  (a,/?')'  e  ©,  a.s. 

Lemma  17.   Conditions  Z.l-Z.2(a)  hold  for  this  example  with  moment  function  given  by 
(F.7),  T  =  y,u  =  y,  Fy{y\x)  =  A(ao(y)  +  x'Po),  ■      '  —  ;••■    .'  .  ^ 

,   Giaoiy),Po,y)-=  E— — —g{W,ao{y),0o),     /;'_;,:. 
o[a,  b') 

an(iQ(y,y)  =  E[5(M/ao(y),/3o)5(W^,"o(y),/9o)'].  "  '' 

Proof  of  Lemma  17.  The  proof  follows  analogously  to  the  proof  of  Lemma  16.  D 
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Table  1:  Decomposing  Changes  in  Measures  of  Wage  Dispersion:  1979-1988,  DR 


Effect  of: 


Statistic 


Total  change 


Minimum 
wage 


Unions 


Individual 
attributes 


Coefficients 


Men: 

Standard 

Deviation 

90-10 

50-10 

90-50 

75-25 

95-5 

Gini 
coefficient 

Women: 

Standard 

Deviation 

90-10 

50-10 

90-50 

75-25 

95-5 

Gini 
coefficient 


8.0(0.3) 
21.5  (1.0) 
11.3  (1.4) 
10.2(1.2) 
15.4(1.1) 
33.0(2.1) 

4.1  (O.I) 

10.9(0.4) 
39.8(1.4) 
33.0(0.7) 

6.8(1.4) 
12.8(0.9) 
38.8(1.9) 

4.0(0.1) 


2.8 

(0.1) 

35.4 

(1.4) 

11.2 

(0.1) 

52.1 

(2.4) 

11.2 

(0.1) 

99.6  ( 

14.1) 

0.0 

(0.0) 

0.0 

(0.0) 

0.0 

(0.0) 

0.0 

(0.0) 

23.0 

(0.7) 

69.9 

(4.1) 

1.3 

(0.0) 

32.1 

(1.2) 

3.8 

(0.1) 

34.9 

(1.5) 

23.0 

(0.2) 

57.9 

(1.9) 

23.0 

(0.2) 

69.9 

(1.6) 

0.0 

(0.0) 

0.0 

(0.0) 

0.0 

(0.0) 

0.0 

(0.0) 

16.8 

(0.5) 

43.2 

(2.2) 

2.0 

(0.1) 

49.0 

(1.8) 

0.7 

0.0) 

8.5 

0.6) 

0.0 

0.0) 

0.0 

0.1) 

-2.0 

1.0) 

7.9  ( 

1.2) 

2.0 

1.0) 

19.7 

'8.4) 

4.1 

q.O) 

26.5 

6.2) 

0.0 

'0.6) 

0.0 

1.7) 

0.5 

0.0) 

11.7 

0.6) 

0.3 

0.0) 

3.2 

'0.4) 

0.9 

'0.5) 

2.3 

1.2) 

0.0 

0.1) 

0.0 

0.4) 

0.9 

0.5) 

13.6 

7.2) 

0.0 

0.5) 

0.0 

3.9) 

0.7 

0.7) 

1.9 

1.9) 

O.I 

0.0) 

3.5 

0.4) 

1,8 

(0.2) 

22.9 

(1.9) 

9.2 

(0.8) 

42.6 

(4.4) 

5.1 

(0.4) 

45.5 

(8.3) 

4.0 

(0.8) 

39.3 

(8.8) 

0.3 

(1.3) 

1.7 

(8.6) 

8.5 

(1.1) 

25.8 

(2.6) 

0.3 

(0.1) 

6.8 

(1.8) 

4.7 

(0.2) 

42.8 

(1.8) 

14.5 

(0.7) 

36.4 

(1.7) 

11.3 

(0.4) 

34.4 

(1.3) 

3.1 

(0.8) 

46.0  ( 

11.3) 

8.3 

(0.2) 

65.1 

(5.0) 

16.4 

(2.0) 

42.1 

(5.0) 

I.O 

(0.1) 

24.5 

(1.4) 

2.7 

(0.3) 

33.1 

(2.4) 

I.l 

(1.3) 

5.3 

(5.9) 

-3.1 

(1.1) 

27.1  ( 

14.0) 

4.2 

(1.1) 

41.0 

(9.8) 

11.1 

(1.2) 

71.8 

(8.7) 

1.4 

(1.5) 

4.3 

(4.4) 

2.0 

(0.1) 

49.4 

(1.8) 

2.1 

(0,3) 

19.1 

(2.5) 

1.3 

(1.1) 

3.4 

(2.6) 

-1.4 

(0.7) 

-4.3 

(2.4) 

2.8 

(1.4) 

40.3 

(9.9) 

4.5 

(0.8) 

35.0 

(4.5) 

5.0 

(2.1) 

12.8 

(5.1) 

0.9 

(0.1) 

23,0 

(2.2) 

Notes  All  numbers  are  in  %  Bootstrapped  standard  errors 
indicates  the  percentage  of  total  variation  The  distribution 


are  given  in  parenthesis  The  second  line  in  each  cell 
regression  model  has  been  applied. 
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Table  2:  Decomposing  Changes  in  Measures  of  Wage  Dispersion:  1979-1988,  CDR 


Effect  of: 


Statistic 


Total  change 


Minimum 
wage 


Unions 


Individual 
attributes 


Coefficients 


Men: 

Standard 

Deviation 

90-10 

50-10 

90-50 

75-25 

95-5 

Gini 

coefficient 

Women: 

Standard 

Deviation 

90-10 

50-10 

90-50 

75-25 
95-5 

Gini 

coefficient 


21 


11 


10 


15 


36 


12 


43 


36 


12 


52 


.2(0.3) 
,5(1.0) 
.3(1.4) 
.2(1.2) 
.4(1.1) 
.4(2.1) 
.2(0.1) 

.7(0.4) 
.2(1.4) 
,4  (0.7) 
.8(1.4) 
.8  (0.9) 
.7(1.9) 
.9(0.1) 


3.3 

0.0) 

40.7 

1.4) 

11.2 

0.1) 

52.1 

2.4) 

11.2 

0.1) 

9.6(1 

4.1) 

0.0 

0,0) 

0.0 

0.0) 

0.0 

0.0) 

0.0 

0.0) 

26.4 

0.7) 

72.7 

3.8) 

1.6 

0.0) 

37.9 

1,1) 

5.6 

0,1) 

44.1 

1.5) 

26.4 

0.2) 

61.2 

1.9) 

26.4 

0.2) 

72.7 

1.6) 

0.0 

0.0) 

0.0 

0.0) 

0.0 

0.0) 

0.0 

0.0) 

30.6 

0.5) 

58.1 

2.2) 

2.9 

(0.1) 

59.2 

1.8) 

0.6 

0.0) 

7.9 

0,5) 

0.0 

0.0) 

0,0 

0,1) 

-2.0 

1.0) 

-17.9  ( 

1.2) 

2.0 

1.0) 

19.7 

8.4) 

4.1 

1.0) 

26.5 

6.2) 

0.0 

0.6) 

0.0 

1.5) 

0.4 

0.0) 

10,7 

0.5) 

0.3 

0.0) 

2.2 

0.4) 

0.9 

(0.5) 

2.2 

(1.2) 

0.0 

0.1) 

0.0 

0.4) 

0.9 

0.5) 

13.6 

(7.2) 

0.0 

0,5) 

0.0 

'3.9) 

0.7 

0.7) 

1.4 

1.9) 

0.1 

0,0) 

1.9 

0.4) 

1.9 

0.2) 

22.5 

1.8) 

9.2 

0.8) 

42,6 

4.4) 

5,1 

0,4) 

45.5 

8.3) 

4,0 

0,8) 

39.3 

8,8) 

0,3 

1,3) 

1.7 

8,6) 

8.5 

1,1) 

23,4 

2.7) 

0,3 

0.1) 

7,1 

1.6) 

5,1 

0,2) 

39,9 

1.8) 

14,5 

0.7) 

33,5 

1.7) 

11,3 

0.4) 

31,2 

1.3) 

3.1 

0.8) 

46,0  ( 

1.3) 

83 

0.2) 

65,1 

5.0) 

16.7 

2.0) 

31,6 

5.0) 

1.3 

0.1) 

26.1 

1,4) 

2.4 

28.9 

1.1 

5.3 

-3,1 

27,2  ( 

4,2 

41,0 

11,1 

71,8 

1.4 

3.9 

1,8 

44,2 

1,7 
13,8 
13,0 

3,1 
-1.4 
-3,9 

2,8 
40,3 

4,5 
35,0 

4.7 

8,8 

0,6 
12.8 


(0,2) 

(2,4) 

(1.3) 

(5.9) 

(1.1) 

14.0) 

(1.1) 

(9.8) 

(1.2) 

(8.7) 

(1.5) 

(4.0) 

(0.1) 

(1.6) 

(0.3) 

(2.5) 

(1.1) 
(2.6) 

(0.7) 

(2.4) 

(1.4) 

(9,9) 

(0,8) 

(4,5) 

(2,1) 

(5,1) 

(0,1) 

(2.2) 

Notes  All  numbers  are  in  °/o 
cell  indicates  the  percentage 


Bootstrapped  standard  errors  are  given  in  parenthesis   The 
of  total  vanation.  The  censored  distnbution  regression  mod' 


second  hne  in  each 
el  has  been  applied 
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Table  3:  Decomposing  Changes  in  Measures  of  Wage  Dispersion:  1979-1988,  CQR 

Effect  of: 


Statistic 


Minimum 
Total  change      wage 


Unions 


Individual 
attributes 


Coefficients 


Men: 

Standard 

Deviation 

90-10 

50-10 

90-50 

75-25 

95-5 

Gini 
coefficient 

Women: 

Standard 

Deviation 

90-10 

50-10 

90-50 

75-25    : 

95-5 

Gini 
coefficient 


22 


12 


12 


39 


13 


48 


37. 


11 


15 


50. 


5. 


,0(0.3) 
3  (I.I) 
,5(0.9) 
7(0.7) 
,7(0.6) 
,2(0.8) 
,5(0.1) 

I  (0.4) 
,8(1.2) 
2(0.7) 
5  (0.9) 
3  (0.9) 
8(1.4) 
1  (O.I) 


4.1 

0.0) 

45.3 

1.5) 

14.2 

0.4) 

63.6 

'3.4) 

14.2 

'0.4) 

148.7  ( 

6.7) 

0.0 

0.0) 

0.0 

0.0) 

0.0 

0.0) 

0.0 

'0.0) 

30.6 

0.0) 

78.1 

,1.8) 

1.9 

,0.0) 

42.2 

I.I) 

6.2 

0.0) 

47.6 

.1.6) 

30.6 

0.0) 

62.8 

1.5) 

30.6 

0.0) 

82,3 

1.6) 

0.0 

0.0) 

0.0 

0.0) 

0.0 

0.0) 

0.0 

0.0) 

30.6 

0.0) 

60.3 

,1.6) 

3.2 

'0.0) 

62.5 

1.5) 

0.3 

0.0) 

3.2 

0.5) 

-0.5 

O.I) 

-2.2 

0.6) 

-1.8 

0.1) 

18.7 

3.0) 

1.3 

0.1) 

10. 1 

1.0) 

1.6 

0.1) 

12.9 

1.2) 

-0.5 

O.I) 

-1.2 

0.3) 

0.3 

0.0) 

5.9 

0.4) 

0.3 

O.I) 

2.6 

0.4) 

0.8 

0.2) 

1.6 

0.3) 

-0.3 

0.1) 

-0.7 

0.3) 

I.l 

0.1) 

9.1 

1.1) 

0.8 

0.1) 

5.6 

0.7) 

l.I 

0.2) 

2.1 

0.4) 

0.1 

0.0) 

2.1 

0.3) 

1.8 

O.I) 

20.0 

1.6) 

7.2 

0.4) 

32.3 

2.8) 

4.6 

0.4) 

47.9 

9.0) 

2.6 

0.3) 

20.6 

2,4) 

2.0 

0.4) 

15.5 

3.0) 

7.4 

0.5) 

18.9 

1.2) 

0.3 

O.I) 

6.1 

1.4) 

4.5 

0.3) 

34.8 

1.5) 

14.7 

0.8) 

30.1 

1.3) 

10.9 

0.8) 

29.4 

1.7) 

3.7 

0.5) 

32,3 

3.3) 

11.8 

0.8) 

77.6 

5.2) 

15.1 

0.8) 

29.7 

1.2) 

I.I 

O.I) 

21.8 

1.2) 

2.8 

(0.2) 

31.5 

(2.2) 

1.4 

(1.1) 

6.4 

(5.1) 

-7.4 

(0.9) 

78.0(21.6) 

8.8 

(0.5) 

69.4 

(2.5) 

9.1 

(0.5) 

71.5 

(3.1) 

1.6 

(0.8) 

4.2 

(2.1) 

2.1 

(0.1) 

45.9 

(1.4) 

2.0 

(0.3) 

15.0 

(1.9) 

2.7 

(0,8) 

5.5 

(1,6) 

-4.1 

(0,5) 

-10.9 

(1,4) 

6.7 

(0,8) 

58.5 

(3,5) 

2.6 

(0.9) 

16.9 

(5,1) 

4,0 

(1,0) 

7.9 

(1.8) 

0.7 

(0,1) 

13.6 

(1.5) 

Notes  All  numbers  are  in  °o 
ceil  indicates  the  percentage 


Bootstrapped  standard  errors  are  given 
of  total  variation.  The  censored  quantile 


in  parenthesis  The  second  hne  in  each 
regression  model  has  been  applied. 
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Figure  1.  Empirical  CDFs  and  95%  simultaneous  confidence  intervals 
for  observed  wages  in  1979  and  1988.  Distributions  for  men  are  plotted 
in  the  upper  panel  and  distributions  for  women  are  plotted  in  the  bottom 
panel.  Confidence  intervals  were  obtained  by  bootstrap  with  100  repetitions. 
Vertical  lines  are  the  levels  of  the  minimum  wage. 
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Figure  2.  95%  simultaneous  confidence  intervals  for  observed  quantile 
functions,  observed  quantile  policy  effects  and  decomposition  of  the  quantile 
policy  effects  for  men.  Confidence  intervals  were  obtained  by  bootstrap  with 
100  repetitions. 
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Figure  3.  95%  simultaneous  confidence  intervals  for  observed  quantile 
functions,  observed  quantile  policy  effects  and  decomposition  of  the  quantile 
policy  effects  for  women.  Confidence  intervals  were  obtained  by  bootstrap 
with  100  repetitions. 
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Figure  4.  95%  simultaneous  confidence  intervals  for  observed  distribu- 
tion functions,  observed  distribution  policy  effects  and  decomposition  of  the 
distribution  policy  effects  for  men.  Confidence  intervals  were  obtained  by 
bootstrap  with  100  repetitions. 
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Figure  5.  95%  simultaneous  confidence  intervals  for  observed  distribu- 
tion functions,  observed  distribution  policy  effects  and  decomposition  of  the 
distribution  policy  effects  for  women.  Confidence  intervals  were  obtained  by 
bootstrap  with  100  repetitions. 
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Figure  6.  95%  simultaneous  confidence  intervals  for  observed  Lorenz, 
observed  Lorenz  policy  effects  and  decomposition  of  the  Lorenz  policy  effects 
for  men.  Confidence  intervals  were  obtained  by  bootstrap  with  100  repeti- 
tions. 
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Figure  7.  95%  simultaneous  confidence  intervals  for  observed  Lorenz, 
observed  Lorenz  policy  effects  and  decomposition  of  the  Lorenz  policy  effects 
for  women.  Confidence  intervals  were  obtained  by  bootstrap  with  100  repe- 
titions. 
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Figure  8.  Comparison  of  distribution  regression,  censored  distribution 
regression  and  censored  quantile  regression  estimates  of  the  decomposition 
of  quantile  policy  effects  for  men. 
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Figure  9.  Comparison  of  distribution  regression,  censored  distribution 
regression  and  censored  quantile  regression  estimates  of  the  decomposition 
of  quantile  policy  effects  for  women. 


Table  Al:  Reversing  the  order  of  the  decomposition:  1979-1988,  DR 
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Effect  of: 


Individual 

Statistic 

Total  change 

attributes 

Men: 

Standard 

8.0  (0.3) 

0.9(0.2) 

Deviation 

11.4(2.8) 

90-10 

21.5(1.0) 

0.4(1.3) 
1.8(5.8) 

50-10 

11.3  (1.4) 

2.5(1.5) 
21.8(12.7 

90-50 

10.2(1.2) 

-2.1(1.2) 
-20,1  (12.9) 

75-25 

15.4(1.1) 

6.4(1.2) 
41.6(6.8) 

95-5 

33.0(2.1) 

2.6(1.7) 
7.9(4.8) 

Gini 

4.1  (0.1) 

-0.3  (0.1) 

coefficient 

-6.8  (2.8) 

Women: 

Standard 

10.9(0.4) 

4.5  (0.2) 

Deviation 

41.1  (2.4) 

90-10 

39.8(1.4) 

11.2  (0.7) 
28.2(1.6) 

50-10 

33.0(0.7) 

7.9(0.8) 
24.1  (2.6) 

90-50 

6.8  (1.4) 

3.3  (0,8) 
47.9(13.4) 

75-25 

12.8  (0.9) 

2.8(0.7) 
22.0(5.6) 

95-5 

38.8(1.9) 

17.4(0.9) 
44.8(3.0) 

Gini 

-      .      4.0(0.1) 

0.6(0.1) 

coefficient 

14.7(2.4) 

Unions 


Minimum 
wage 


Coefficients 


1.5 

(0.1) 

19.2 

(1.0) 

8.8 

(1.2) 

40.7 

(5.6) 

0.7 

(1.2) 

5.8  ( 

11.3) 

8.1 

(0.8) 

79,1  ( 

0.7) 

-2.1 

(1.3) 

-13.4 

(9.2) 

5.9 

(I.l) 

17.9 

(3.4) 

1.0 

(0.0) 

24.1 

(1.1) 

0.0 

(0,0) 

-0.2 

(0,2) 

0.0 

(0,4) 

0.0 

(1,0) 

-0.8 

(0,5) 

-2.4 

(1,7) 

0.8 

(0,5) 

11.7 

(7,2) 

0.0 

(0,5) 

0.0 

(3,9) 

0.0 

(0,3) 

0.0 

(0,9) 

0.0 

(0.0) 

1.2 

(0,3) 

2.9 

(0.2) 

36.3 

(2.8) 

11.2 

(0.9) 

52.1 

(4.8) 

11.2 

(0.5) 

99.6  ( 

15.5) 

0.0 

(0.9) 

0.0 

(9.5) 

0.0 

(1,0) 

0.0 

(7,4) 

23.0 

(1,0) 

69.9 

(5.5) 

1.4 

(0.1) 

33.3 

(2.5) 

4.4 

(0.3) 

40.0 

(2.4) 

27.2 

(0.2) 

68.4 

(2.3) 

22,7 

(0,5) 

82,6 

(2,2) 

0,0 

(0,5) 

0,0 

(6,5) 

5,5 

(0,3) 

43,0 

(3,5) 

16,5 

(2,0) 

42,4 

(5,1) 

2.5 

(0,1) 

61.1 

(2,7) 

2,7 

.0.3) 

33,1 

;2.4) 

1,1 

'1.3) 

5,3 

5.9) 

-3.1 

l.I) 

27.1  ( 

4.0) 

4.2 

1.1) 

41.0 

9.8) 

11.1 

1.2) 

71.8 

8.7) 

1.4 

1.5) 

4.3 

4.4) 

2.0 

0.1) 

49.4 

1.8) 

2.1 

0.3) 

19.1 

2.5) 

1.3 

1.1) 

3.4 

2.6) 

-1.4 

0.7) 

-4.3 

2.4) 

2.8 

1.4) 

40.3 

9.9) 

4.5 

0.8) 

35.0 

4.5) 

5.0 

2.1) 

12.8 

5.1) 

0.9 

0.1) 

23.0 

2.2) 

Notes  All  numbers  are  in  % 
cell  indicates  the  percentage 


Bootstrapped  standard  errors  are  given  in  parenthesis  The  second  line  in  each 
of  total  variation  The  distribution  regression  model  has  been  applied 


